System-wide speech-to-text for macOS. Hold a hotkey, speak, release -- transcribed text appears wherever your cursor is. Runs entirely locally using whisper.cpp. No paid services, no rate limits.
Hold hotkey --> Record mic --> Whisper transcribes --> Text pasted at cursor
- You run
voxin a terminal (or as a background process) - Switch to any app -- editor, browser, terminal, chat
- Hold your hotkey (e.g. Fn, Cmd+Shift), speak naturally
- Release -- text appears where your cursor is
You hear a gentle chime on start and stop.
Requirements:
- macOS
- Homebrew
- Go 1.24+
git clone https://github.com/mattthewong/vox.git
cd vox
make setup # installs sox + whisper-cpp and downloads a default model if missing
make start # starts whisper-server and runs voxbrew install sox whisper-cpp
mkdir -p ~/.local/share/whisper-cpp
curl -L -o ~/.local/share/whisper-cpp/ggml-base.en.bin \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"make build # outputs bin/vox
make install # installs to /usr/local/bin/voxwhisper-server --host 127.0.0.1 --port 2022 \
--model ~/.local/share/whisper-cpp/ggml-base.en.bin
voxOn first run, macOS will prompt for two permissions. Grant them to your terminal app (Terminal, iTerm2, Ghostty, etc.):
- Microphone -- System Settings > Privacy & Security > Microphone
- Accessibility -- System Settings > Privacy & Security > Accessibility
All via environment variables:
| Variable | Default | Description |
|---|---|---|
VOX_HOTKEY |
option+space |
Hotkey to trigger recording. Comma-separated for multiple. |
WHISPER_URL |
http://127.0.0.1:2022 |
Whisper server URL |
VOX_HOLD_TO_TALK |
true |
true = hold to record, false = toggle on/off |
VOX_LANGUAGE |
(auto-detect) | BCP-47 language code (e.g. en, es) |
VOX_VERBOSE |
false |
Debug logging |
VOX_HOTKEY="fn" # Fn / Globe key
VOX_HOTKEY="cmd+shift" # Modifier-only (no extra key needed)
VOX_HOTKEY="option+space" # Modifier + key
VOX_HOTKEY="ctrl+shift+d" # Multiple modifiers + key
VOX_HOTKEY="fn,cmd+shift" # Multiple hotkeys (either triggers)Available modifiers: ctrl, shift, option/alt, cmd/command
Available keys: a-z, 0-9, f1-f20, space, return, escape, tab, delete, arrow keys
cmd/vox/main.go -- Entrypoint, hotkey event loop, orchestration
internal/hotkey/ -- CGEventTap-based global hotkey (supports modifier-only, fn, modifier+key)
hotkey_darwin.go -- Go listener with keydown/keyup channels
bridge.c -- C event tap callback
internal/audio/ -- Mic recording via ffmpeg/sox subprocess
recorder.go -- Start/stop recording, WAV output
sound.go -- Embedded chime sounds (start/stop)
internal/transcribe/ -- Whisper HTTP client
client.go -- Multipart upload, auto-detects /inference vs /v1/audio/transcriptions
internal/inject/ -- Text injection into focused app
paste_darwin.go -- pbcopy + CGEvent Cmd+V (works in any app)
internal/config/ -- Env var config + hotkey string parsing
make build # Build binary
make test # Run all tests
make test-short # Skip integration tests
make lint # go vet
make fmt # gofmt
make run # Build and runI was using Whisper Flow for speech-to-text but kept hitting rate limits on their free plan. Vox does the same thing -- system-wide dictation with a hold-to-talk hotkey -- but runs entirely on your machine with no external dependencies.
MIT
