7.3 KiB
rvoice — push-to-talk dictation for remote rclaude sessions
/voice in Claude Code opens the mic on whichever host the claude binary is
running on. When you're sshed to apricot through cc / rclaude resume,
that's apricot — which has no mic. rvoice fills the gap.
It records audio locally on macOS, transcribes via the LAN speech-synthesis
service on apricot (Whisper, GPU-accelerated, no API keys / no network
egress beyond the local LAN), and injects the transcript into the active
remote tmux session via tmux send-keys over ssh. The target session is
auto-detected from the focused iTerm2 tab title (set by the canonical
session-tools tmux.conf to <host> · <session>).
Architecture
[ Right ⌥ down ] ──Hammerspoon──▶ rvoice start ──▶ ffmpeg → recording.wav
[ Right ⌥ up ] ──Hammerspoon──▶ rvoice stop
│
▼
POST WAV → http://apricot.lan:8000/stt/transcribe
(faster-whisper on GPU, ~base model)
│
▼
iTerm2 active tab title → "apricot · claude-…"
│
▼
ssh apricot tmux send-keys -t claude-… -l "<text>"
Files
| Path | Role |
|---|---|
bin/rvoice |
CLI: start/stop/cancel/target/log |
hammerspoon/rvoice.lua |
Right-⌥ hold detector → calls rvoice |
~/.config/rvoice/config |
Sourced at startup; overrides STT URL, model, etc. |
$TMPDIR/rvoice/ |
Per-recording state (pid, wav, log) |
Install
Prerequisites: ffmpeg, jq, curl (all brew installable), Hammerspoon
(brew install --cask hammerspoon), and the LAN speech-synthesis service
running on apricot (already deployed at apricot.lan:8000, exposes
/stt/transcribe). No API keys, no cloud round-trip.
# 1. Symlink rvoice (already done if you ran install.sh)
ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice
# 2. (Optional) override defaults in ~/.config/rvoice/config — see the
# "Config" section below. The default is to POST to apricot.lan:8000 and
# use the `base` Whisper model.
# 3. Wire up Hammerspoon
mkdir -p ~/.hammerspoon
ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua
echo 'require("rvoice")' >> ~/.hammerspoon/init.lua
open /Applications/Hammerspoon.app
# 4. From Hammerspoon's menu bar → Reload Config.
# Grant Accessibility + Microphone permission when macOS prompts.
# 5. Smoke-test the STT endpoint without Hammerspoon:
ffmpeg -f avfoundation -i ":0" -ac 1 -ar 16000 -t 5 /tmp/me.wav
curl -F "audio=@/tmp/me.wav" -F "model=base" -F "language=en" -F "task=transcribe" \
http://apricot.lan:8000/stt/transcribe | jq .text
Usage
From any iTerm2 tab that's attached to a remote claude session via cc or
rclaude resume:
- Hold Right ⌥ → "listening…" notification, Tink sound
- Speak
- Release → recording stops, transcript types into your claude prompt, Pop sound on success / Funk sound on error
- Hit Enter when you're ready (review first), or set
RVOICE_AUTOSEND=1to skip the manual confirmation
Config (~/.config/rvoice/config)
Plain shell fragment sourced at startup. Defaults shown.
export RVOICE_STT_URL=http://apricot.lan:8000 # speech-synthesis service
export RVOICE_MODEL=base # tiny|base|small|medium|large-v2|large-v3
export RVOICE_LANG=en # omit/empty = auto-detect
export RVOICE_AUTOSEND=0 # 1 = press Enter after inject
export RVOICE_MIN_MS=200 # ignore taps shorter than this (debounce)
export RVOICE_MAX_S=60 # hard cap on a single recording
export RVOICE_HOST=apricot.lan # force target host (overrides iTerm2 detection)
export RVOICE_SESSION=claude-natalie-… # force target tmux session
Override any of these per-invocation: RVOICE_MODEL=small rvoice stop.
Model trade-offs (apricot's GPU; latency rough):
tiny.en/base— sub-second, fine for short promptssmall— ~1s, noticeable quality bumpmedium/large-v3— 2-4s, near-perfect, worth it for paragraphs
Subcommands
rvoice start # begin recording (Hammerspoon calls this on key-down)
rvoice stop # stop, transcribe, inject (called on key-up)
rvoice cancel # stop without transcribing (called on quick-tap abort)
rvoice target # debug: echo the host+session rvoice WOULD inject into
rvoice log # tail -50 of the action log
Troubleshooting
- "STT request failed" — apricot's speech service isn't reachable. Check
curl http://apricot.lan:8000/healthandssh apricot.lan systemctl --user statusfor the relevant unit. Most likely you're off the LAN/VPN. - "no target session resolvable" — the focused iTerm2 tab title isn't in
<host> · <session>format. Either: (a) you're not in an rclaude/ssh session, or (b) the remote tmux config didn't get the title-setting fragment.rclaude install --on <host>re-pushes the canonical tmux config; verify withssh <host> 'tmux show-options -g | grep set-titles'. - Hammerspoon doesn't see Right ⌥ — System Settings → Privacy & Security → Accessibility → enable Hammerspoon. Also Microphone for the recording step. Restart Hammerspoon after granting.
- Transcription returns empty / nonsense — bump the model:
RVOICE_MODEL=smallormedium. Defaultbasetrades accuracy for sub-second latency. Models list:curl http://apricot.lan:8000/stt/models. - Injection types into the wrong session —
rvoice targetshows what it will hit. If wrong, setRVOICE_HOST/RVOICE_SESSIONin config to pin the target. - Latency feels high — first call after service idle warms the model on
apricot's GPU (1-2s one-time). Subsequent calls are sub-second for
base. Switch totiny.enfor the lowest-latency tier.
Why this architecture (vs. /voice over ssh)
/voice is a feature of the claude binary itself; it opens the mic via
the OS audio API on whichever host it runs on. ssh has no audio channel and
doesn't forward CoreAudio events. The only ways to make /voice work over a
remote rclaude session would be:
- Run claude locally (lose apricot's compute / project files / LAN services — not viable for our workflow)
- Forward audio via PulseAudio (brittle on macOS, breaks on every claude release)
- Reproduce /voice's behavior with our own pieces ← this is rvoice
rvoice keeps the mic and the hotkey on the Mac, runs transcription on
apricot's own LAN-resident speech-synthesis service (GPU Whisper, zero
local model RAM, no cloud egress), and uses tmux's existing send-keys
protocol to deliver text — every layer is well-understood and stable.