Natalie fedabb0924 docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-05-17 18:12:14 -07:00

7.3 KiB

Raw Blame History

rvoice — push-to-talk dictation for remote rclaude sessions

/voice in Claude Code opens the mic on whichever host the claude binary is running on. When you're sshed to apricot through cc / rclaude resume, that's apricot — which has no mic. rvoice fills the gap.

It records audio locally on macOS, transcribes via the LAN speech-synthesis service on apricot (Whisper, GPU-accelerated, no API keys / no network egress beyond the local LAN), and injects the transcript into the active remote tmux session via tmux send-keys over ssh. The target session is auto-detected from the focused iTerm2 tab title (set by the canonical session-tools tmux.conf to <host> · <session>).

Architecture

[ Right ⌥ down ]  ──Hammerspoon──▶  rvoice start  ──▶  ffmpeg → recording.wav
[ Right ⌥ up ]    ──Hammerspoon──▶  rvoice stop
                                          │
                                          ▼
                  POST WAV → http://apricot.lan:8000/stt/transcribe
                              (faster-whisper on GPU, ~base model)
                                          │
                                          ▼
                  iTerm2 active tab title → "apricot · claude-…"
                                          │
                                          ▼
                  ssh apricot tmux send-keys -t claude-… -l "<text>"

Files

Path	Role
`bin/rvoice`	CLI: `start`/`stop`/`cancel`/`target`/`log`
`hammerspoon/rvoice.lua`	Right-⌥ hold detector → calls `rvoice`
`~/.config/rvoice/config`	Sourced at startup; overrides STT URL, model, etc.
`$TMPDIR/rvoice/`	Per-recording state (pid, wav, log)

Install

Prerequisites: ffmpeg, jq, curl (all brew installable), Hammerspoon (brew install --cask hammerspoon), and the LAN speech-synthesis service running on apricot (already deployed at apricot.lan:8000, exposes /stt/transcribe). No API keys, no cloud round-trip.

# 1. Symlink rvoice (already done if you ran install.sh)
ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice

# 2. (Optional) override defaults in ~/.config/rvoice/config — see the
#    "Config" section below. The default is to POST to apricot.lan:8000 and
#    use the `base` Whisper model.

# 3. Wire up Hammerspoon
mkdir -p ~/.hammerspoon
ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua
echo 'require("rvoice")' >> ~/.hammerspoon/init.lua
open /Applications/Hammerspoon.app

# 4. From Hammerspoon's menu bar → Reload Config.
#    Grant Accessibility + Microphone permission when macOS prompts.

# 5. Smoke-test the STT endpoint without Hammerspoon:
ffmpeg -f avfoundation -i ":0" -ac 1 -ar 16000 -t 5 /tmp/me.wav
curl -F "audio=@/tmp/me.wav" -F "model=base" -F "language=en" -F "task=transcribe" \
  http://apricot.lan:8000/stt/transcribe | jq .text

Usage

From any iTerm2 tab that's attached to a remote claude session via cc or rclaude resume:

Hold Right ⌥ → "listening…" notification, Tink sound
Speak
Release → recording stops, transcript types into your claude prompt, Pop sound on success / Funk sound on error
Hit Enter when you're ready (review first), or set RVOICE_AUTOSEND=1 to skip the manual confirmation

Config (`~/.config/rvoice/config`)

Plain shell fragment sourced at startup. Defaults shown.

export RVOICE_STT_URL=http://apricot.lan:8000        # speech-synthesis service
export RVOICE_MODEL=base                             # tiny|base|small|medium|large-v2|large-v3
export RVOICE_LANG=en                                # omit/empty = auto-detect
export RVOICE_AUTOSEND=0                             # 1 = press Enter after inject
export RVOICE_MIN_MS=200                             # ignore taps shorter than this (debounce)
export RVOICE_MAX_S=60                               # hard cap on a single recording
export RVOICE_HOST=apricot.lan                       # force target host (overrides iTerm2 detection)
export RVOICE_SESSION=claude-natalie-…               # force target tmux session

Override any of these per-invocation: RVOICE_MODEL=small rvoice stop.

Model trade-offs (apricot's GPU; latency rough):

tiny.en / base — sub-second, fine for short prompts
small — ~1s, noticeable quality bump
medium / large-v3 — 2-4s, near-perfect, worth it for paragraphs

Subcommands

rvoice start    # begin recording (Hammerspoon calls this on key-down)
rvoice stop     # stop, transcribe, inject (called on key-up)
rvoice cancel   # stop without transcribing (called on quick-tap abort)
rvoice target   # debug: echo the host+session rvoice WOULD inject into
rvoice log      # tail -50 of the action log

Troubleshooting

"STT request failed" — apricot's speech service isn't reachable. Check curl http://apricot.lan:8000/health and ssh apricot.lan systemctl --user status for the relevant unit. Most likely you're off the LAN/VPN.
"no target session resolvable" — the focused iTerm2 tab title isn't in <host> · <session> format. Either: (a) you're not in an rclaude/ssh session, or (b) the remote tmux config didn't get the title-setting fragment. rclaude install --on <host> re-pushes the canonical tmux config; verify with ssh <host> 'tmux show-options -g | grep set-titles'.
Hammerspoon doesn't see Right ⌥ — System Settings → Privacy & Security → Accessibility → enable Hammerspoon. Also Microphone for the recording step. Restart Hammerspoon after granting.
Transcription returns empty / nonsense — bump the model: RVOICE_MODEL=small or medium. Default base trades accuracy for sub-second latency. Models list: curl http://apricot.lan:8000/stt/models.
Injection types into the wrong session — rvoice target shows what it will hit. If wrong, set RVOICE_HOST / RVOICE_SESSION in config to pin the target.
Latency feels high — first call after service idle warms the model on apricot's GPU (1-2s one-time). Subsequent calls are sub-second for base. Switch to tiny.en for the lowest-latency tier.

Why this architecture (vs. /voice over ssh)

/voice is a feature of the claude binary itself; it opens the mic via the OS audio API on whichever host it runs on. ssh has no audio channel and doesn't forward CoreAudio events. The only ways to make /voice work over a remote rclaude session would be:

Run claude locally (lose apricot's compute / project files / LAN services — not viable for our workflow)
Forward audio via PulseAudio (brittle on macOS, breaks on every claude release)
Reproduce /voice's behavior with our own pieces ← this is rvoice

rvoice keeps the mic and the hotkey on the Mac, runs transcription on apricot's own LAN-resident speech-synthesis service (GPU Whisper, zero local model RAM, no cloud egress), and uses tmux's existing send-keys protocol to deliver text — every layer is well-understood and stable.

7.3 KiB Raw Blame History