6.4 KiB
rvoice — push-to-talk dictation for remote rclaude sessions
/voice in Claude Code opens the mic on whichever host the claude binary is
running on. When you're sshed to apricot through cc / rclaude resume,
that's apricot — which has no mic. rvoice fills the gap.
It records audio locally on macOS, transcribes via Groq Whisper (no local model
RAM), and injects the transcript into the active remote tmux session via
tmux send-keys over ssh. The target session is auto-detected from the
focused iTerm2 tab title (set by the canonical session-tools tmux.conf to
<host> · <session>).
Architecture
[ Right ⌥ down ] ──Hammerspoon──▶ rvoice start ──▶ ffmpeg → recording.wav
[ Right ⌥ up ] ──Hammerspoon──▶ rvoice stop
│
▼
POST WAV → Groq /audio/transcriptions
│
▼
iTerm2 active tab title → "apricot · claude-…"
│
▼
ssh apricot tmux send-keys -t claude-… -l "<text>"
Files
| Path | Role |
|---|---|
bin/rvoice |
CLI: start/stop/cancel/target/log |
hammerspoon/rvoice.lua |
Right-⌥ hold detector → calls rvoice |
~/.config/rvoice/config |
Sourced at startup; holds GROQ_API_KEY and tweaks |
$TMPDIR/rvoice/ |
Per-recording state (pid, wav, log) |
Install
Prerequisites: ffmpeg, jq, curl (all brew installable), a Groq API key
(free tier — https://console.groq.com/keys), and Hammerspoon
(brew install --cask hammerspoon).
# 1. Symlink rvoice (already done if you ran install.sh)
ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice
# 2. Drop your Groq key
mkdir -p ~/.config/rvoice
cat >> ~/.config/rvoice/config <<'EOF'
export GROQ_API_KEY=gsk_...your_key...
# export RVOICE_AUTOSEND=1 # uncomment to auto-press Enter after injection
EOF
# 3. Wire up Hammerspoon
mkdir -p ~/.hammerspoon
ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua
echo 'require("rvoice")' >> ~/.hammerspoon/init.lua
open /Applications/Hammerspoon.app
# 4. From Hammerspoon's menu bar → Reload Config.
# Grant Accessibility + Microphone permission when macOS prompts.
Usage
From any iTerm2 tab that's attached to a remote claude session via cc or
rclaude resume:
- Hold Right ⌥ → "listening…" notification, Tink sound
- Speak
- Release → recording stops, transcript types into your claude prompt, Pop sound on success / Funk sound on error
- Hit Enter when you're ready (review first), or set
RVOICE_AUTOSEND=1to skip the manual confirmation
Config (~/.config/rvoice/config)
Plain shell fragment sourced at startup. Defaults shown.
export GROQ_API_KEY=... # REQUIRED
export RVOICE_MODEL=whisper-large-v3-turbo # Groq model id
export RVOICE_AUTOSEND=0 # 1 = press Enter after inject
export RVOICE_MIN_MS=200 # ignore taps shorter than this (debounce)
export RVOICE_MAX_S=60 # hard cap on a single recording
export RVOICE_HOST=apricot.lan # force target host (overrides iTerm2 detection)
export RVOICE_SESSION=claude-natalie-… # force target tmux session
Override any of these per-invocation: RVOICE_AUTOSEND=1 rvoice stop.
Subcommands
rvoice start # begin recording (Hammerspoon calls this on key-down)
rvoice stop # stop, transcribe, inject (called on key-up)
rvoice cancel # stop without transcribing (called on quick-tap abort)
rvoice target # debug: echo the host+session rvoice WOULD inject into
rvoice log # tail -50 of the action log
Troubleshooting
- "GROQ_API_KEY not set" — Hammerspoon's shell environment doesn't inherit
from your login shell. Make sure the key is exported in
~/.config/rvoice/config; rvoice sources that file before each invocation. - "no target session resolvable" — the focused iTerm2 tab title isn't in
<host> · <session>format. Either: (a) you're not in an rclaude/ssh session, or (b) the remote tmux config didn't get the title-setting fragment.rclaude install --on <host>re-pushes the canonical tmux config; verify withssh <host> 'tmux show-options -g | grep set-titles'. - Hammerspoon doesn't see Right ⌥ — System Settings → Privacy & Security → Accessibility → enable Hammerspoon. Also Microphone for the recording step. Restart Hammerspoon after granting.
- Transcription returns nonsense — Groq's
whisper-large-v3-turbois multilingual but English-biased. SetRVOICE_MODEL=whisper-large-v3for the slower but more accurate variant. - Injection types into the wrong session —
rvoice targetshows what it will hit. If wrong, setRVOICE_HOST/RVOICE_SESSIONin config to pin the target. - Latency feels high — Groq is fast (~500ms for short clips). Network latency to plum + ssh round-trip to apricot adds ~200ms. Local Whisper would be slower in practice on most laptops once you account for model load.
Why this architecture (vs. /voice over ssh)
/voice is a feature of the claude binary itself; it opens the mic via
the OS audio API on whichever host it runs on. ssh has no audio channel and
doesn't forward CoreAudio events. The only ways to make /voice work over a
remote rclaude session would be:
- Run claude locally (lose apricot's compute / project files / LAN services — not viable for our workflow)
- Forward audio via PulseAudio (brittle on macOS, breaks on every claude release)
- Reproduce /voice's behavior with our own pieces ← this is rvoice
rvoice keeps the mic and the hotkey on the Mac, runs transcription on a
hosted endpoint (zero local RAM), and uses tmux's existing send-keys
protocol to deliver text — every layer is well-understood and stable.