Natalie 4a5b2f7273 feat(@scripts/session-tools): ✨ add rvoice dictation tool

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-05-17 17:54:08 -07:00

6.4 KiB

Raw Blame History

rvoice — push-to-talk dictation for remote rclaude sessions

/voice in Claude Code opens the mic on whichever host the claude binary is running on. When you're sshed to apricot through cc / rclaude resume, that's apricot — which has no mic. rvoice fills the gap.

It records audio locally on macOS, transcribes via Groq Whisper (no local model RAM), and injects the transcript into the active remote tmux session via tmux send-keys over ssh. The target session is auto-detected from the focused iTerm2 tab title (set by the canonical session-tools tmux.conf to <host> · <session>).

Architecture

[ Right ⌥ down ]  ──Hammerspoon──▶  rvoice start  ──▶  ffmpeg → recording.wav
[ Right ⌥ up ]    ──Hammerspoon──▶  rvoice stop
                                          │
                                          ▼
                            POST WAV → Groq /audio/transcriptions
                                          │
                                          ▼
                            iTerm2 active tab title → "apricot · claude-…"
                                          │
                                          ▼
                            ssh apricot tmux send-keys -t claude-… -l "<text>"

Files

Path	Role
`bin/rvoice`	CLI: `start`/`stop`/`cancel`/`target`/`log`
`hammerspoon/rvoice.lua`	Right-⌥ hold detector → calls `rvoice`
`~/.config/rvoice/config`	Sourced at startup; holds `GROQ_API_KEY` and tweaks
`$TMPDIR/rvoice/`	Per-recording state (pid, wav, log)

Install

Prerequisites: ffmpeg, jq, curl (all brew installable), a Groq API key (free tier — https://console.groq.com/keys), and Hammerspoon (brew install --cask hammerspoon).

# 1. Symlink rvoice (already done if you ran install.sh)
ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice

# 2. Drop your Groq key
mkdir -p ~/.config/rvoice
cat >> ~/.config/rvoice/config <<'EOF'
export GROQ_API_KEY=gsk_...your_key...
# export RVOICE_AUTOSEND=1     # uncomment to auto-press Enter after injection
EOF

# 3. Wire up Hammerspoon
mkdir -p ~/.hammerspoon
ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua
echo 'require("rvoice")' >> ~/.hammerspoon/init.lua
open /Applications/Hammerspoon.app

# 4. From Hammerspoon's menu bar → Reload Config.
#    Grant Accessibility + Microphone permission when macOS prompts.

Usage

From any iTerm2 tab that's attached to a remote claude session via cc or rclaude resume:

Hold Right ⌥ → "listening…" notification, Tink sound
Speak
Release → recording stops, transcript types into your claude prompt, Pop sound on success / Funk sound on error
Hit Enter when you're ready (review first), or set RVOICE_AUTOSEND=1 to skip the manual confirmation

Config (`~/.config/rvoice/config`)

Plain shell fragment sourced at startup. Defaults shown.

export GROQ_API_KEY=...                              # REQUIRED
export RVOICE_MODEL=whisper-large-v3-turbo           # Groq model id
export RVOICE_AUTOSEND=0                             # 1 = press Enter after inject
export RVOICE_MIN_MS=200                             # ignore taps shorter than this (debounce)
export RVOICE_MAX_S=60                               # hard cap on a single recording
export RVOICE_HOST=apricot.lan                       # force target host (overrides iTerm2 detection)
export RVOICE_SESSION=claude-natalie-…               # force target tmux session

Override any of these per-invocation: RVOICE_AUTOSEND=1 rvoice stop.

Subcommands

rvoice start    # begin recording (Hammerspoon calls this on key-down)
rvoice stop     # stop, transcribe, inject (called on key-up)
rvoice cancel   # stop without transcribing (called on quick-tap abort)
rvoice target   # debug: echo the host+session rvoice WOULD inject into
rvoice log      # tail -50 of the action log

Troubleshooting

"GROQ_API_KEY not set" — Hammerspoon's shell environment doesn't inherit from your login shell. Make sure the key is exported in ~/.config/rvoice/config; rvoice sources that file before each invocation.
"no target session resolvable" — the focused iTerm2 tab title isn't in <host> · <session> format. Either: (a) you're not in an rclaude/ssh session, or (b) the remote tmux config didn't get the title-setting fragment. rclaude install --on <host> re-pushes the canonical tmux config; verify with ssh <host> 'tmux show-options -g | grep set-titles'.
Hammerspoon doesn't see Right ⌥ — System Settings → Privacy & Security → Accessibility → enable Hammerspoon. Also Microphone for the recording step. Restart Hammerspoon after granting.
Transcription returns nonsense — Groq's whisper-large-v3-turbo is multilingual but English-biased. Set RVOICE_MODEL=whisper-large-v3 for the slower but more accurate variant.
Injection types into the wrong session — rvoice target shows what it will hit. If wrong, set RVOICE_HOST / RVOICE_SESSION in config to pin the target.
Latency feels high — Groq is fast (~500ms for short clips). Network latency to plum + ssh round-trip to apricot adds ~200ms. Local Whisper would be slower in practice on most laptops once you account for model load.

Why this architecture (vs. /voice over ssh)

/voice is a feature of the claude binary itself; it opens the mic via the OS audio API on whichever host it runs on. ssh has no audio channel and doesn't forward CoreAudio events. The only ways to make /voice work over a remote rclaude session would be:

Run claude locally (lose apricot's compute / project files / LAN services — not viable for our workflow)
Forward audio via PulseAudio (brittle on macOS, breaks on every claude release)
Reproduce /voice's behavior with our own pieces ← this is rvoice

rvoice keeps the mic and the hotkey on the Mac, runs transcription on a hosted endpoint (zero local RAM), and uses tmux's existing send-keys protocol to deliver text — every layer is well-understood and stable.

6.4 KiB Raw Blame History