Dictation that speaks your language. All of them. Entirely on your Mac.
You're a multilingual professional. You switch between Slack and WeChat all day, mixing English jargon into your native language mid-sentence. Every dictation app chokes on it. Veery was built for exactly that — entirely on your Mac, for free.
- The Problem
- How It Works
- Features
- Quick Start
- The Jargon System
- Configuration
- Comparison
- Contributing
You say "让Claude帮我review一下这个PR,看看API的latency有没有改善" and your dictation app produces garbage. Apple Dictation forces you to pick one language. Wispr Flow and SuperWhisper don't understand that "Sharpe ratio" should stay in English, not become "夏普率". And every tool turns "PyTorch" into "pie torch" because their models have never seen your jargon.
Veery was built because no one else was going to build it.
Hold Right Cmd → Speak naturally in any language mix → Release → Text appears
Audio → STT → Jargon Correction → Filler Removal → Paste to active app
↓ ↓ ↓
SenseVoice fuzzy+phonetic strips "um",
or Whisper YAML matching "嗯", "额"
- Instant — Text appears the moment you stop speaking
- Accurate — Powered by Whisper large-v3-turbo, near-human accuracy across languages
- Free forever — No subscription, no usage limits, no account
- Private — Runs entirely on your Mac. No audio ever leaves your machine.
- Just hold and speak — Hold Right Cmd, talk naturally in any mix of languages, release. Text appears wherever your cursor is.
- Your jargon, preserved — "PyTorch" stays "PyTorch", not "pie torch". Fuzzy + phonetic matching catches what STT models get wrong.
- Gets smarter as you use it — Edit a dictation or re-dictate, and Veery learns the correction. No manual configuration needed.
- Knows your codebase — Run
--mine ~/codeand Veery extracts class names, constants, imports, and Claude Code custom commands from your projects. Your jargon stays ahead of any model's training data. - Clean output, no editing — Filler words ("um", "嗯", "那个") stripped automatically. What you get is ready to send.
- Your data never leaves your Mac — No cloud, no account, no telemetry. Models download once, then everything runs offline forever. Fully open source — read every line of code.
- macOS 14+ (Sonoma) with Apple Silicon (M1/M2/M3/M4)
- Python 3.13+
- uv package manager
- PortAudio (
brew install portaudio)
git clone https://github.com/andyhcwang/veery.git
cd veery
bash install.sh # checks prerequisites, installs depsOr manually:
git clone https://github.com/andyhcwang/veery.git
cd veery
uv syncuv run veeryOn first launch, Veery will:
- Guide you through granting Accessibility, Microphone, and Input Monitoring permissions
- Download STT models (~200MB for SenseVoice, ~1.5GB for Whisper) with a progress bar -- you can start dictating with SenseVoice while Whisper downloads in the background
- Show a microphone icon in your menubar when ready
Note: First-time setup requires an internet connection to download models. After that, Veery works fully offline.
Hold Right Cmd, speak in whatever mix of languages comes naturally, release, and watch the text appear.
Veery includes a py2app target for an arm64-only menubar development app bundle:
uv sync --group app
./.venv/bin/python setup.py py2app -AThe app lands at dist/Veery.app and runs against your live checkout.
Notes:
- The bundle is configured as an agent app (
LSUIElement) so it stays in the menubar without a Dock icon. NSMicrophoneUsageDescriptionis embedded in the app bundle plist for macOS microphone permission prompts.- This is a developer/testing wrapper around the current source tree, not a fully standalone distributable app.
STT models are trained on general speech — they don't know your domain vocabulary. Veery fixes this with a three-layer correction system that runs in <1ms:
# jargon/tech.yaml
terms:
PyTorch:
- pie torch
- py torch
DuckDB:
- duck dee bee
- duck DB
Sharpe ratio:
- sharp ratio
- sharp issueWhen the STT outputs "pie torch", Veery instantly corrects it to "PyTorch".
Even if the STT output doesn't exactly match a variant, fuzzy matching (via rapidfuzz) catches close approximations. "pytorche" still matches "PyTorch". Threshold is tunable (default: 82/100).
For terms where the STT gets the sounds right but the spelling wrong, consonant-skeleton matching catches them. "NumPi" and "NumPy" share the same skeleton nmp, so Veery knows they're the same term.
Create a YAML file in jargon/, add your terms with STT variants, and reference it in config.yaml. Or open jargon files directly from the Veery menubar under Jargon Dictionaries. See CONTRIBUTING.md for the full format.
Pre-built packs ship for common domains: ai_ml.yaml, devops_cloud.yaml, frontend.yaml. Add them to your config.yaml under jargon.dict_paths. Want to contribute a pack for your domain? See Contributing.
Veery works out of the box with sensible defaults. To customize, edit config.yaml in the project root:
# config.yaml
stt:
backend: whisper # "sensevoice" or "whisper" (default: whisper)
audio:
max_duration_sec: 30.0 # Auto-stop after 30s
vad:
threshold: 0.4 # Speech detection sensitivity (lower = more sensitive)
silence_duration_sec: 2.0 # Seconds of silence before auto-stop
hotkey:
key_combo: right_cmd # Push-to-talk key
mode: hold # "hold" (push-to-talk) or "toggle" (press-to-toggle)
jargon:
dict_paths:
- jargon/quant_finance.yaml
- jargon/tech.yaml
- jargon/claude_code.yaml # Claude Code slash commands
- jargon/mined.yaml # Auto-generated by --mine
- jargon/mined_commands.yaml # Auto-generated by --mine
fuzzy_threshold: 82 # Fuzzy match sensitivity (0-100)
output:
cgevent_char_limit: 500 # Text shorter than this is typed character-by-character;
# longer text is pasted via clipboard
learning:
enabled: true
promotion_threshold: 3 # Corrections needed before auto-adding to dictionary| Veery | SuperWhisper | Wispr Flow | Apple Dictation | |
|---|---|---|---|---|
| Price | Free forever | $8.49/mo | $10/mo | Free |
| Privacy | 100% local | Local | Local + cloud options | Cloud |
| Multilingual | Purpose-built | Multi-lang (generic) | Multi-lang (generic) | Single language only |
| Jargon handling | Fuzzy + phonetic + auto-learn | Find-and-replace | Auto-learn | None |
| Chinese STT | SenseVoice (SOTA Chinese) | Whisper (English-first) | Proprietary | Apple ASR |
| Custom dictionaries | YAML (open, editable) | Vocabulary hints | Manual add | None |
| Codebase mining | Yes (--mine) |
No | No | No |
| Open source | Yes | No | No | No |
Veery doesn't compete on polish or mobile support. It wins on the axis that matters to you: mixed-language technical speech with domain jargon.
We welcome contributions, especially:
- Jargon packs -- Add terms for your domain (biotech, crypto, game dev, etc.)
- Bug reports -- Open an issue with your STT output and expected correction
- STT improvements -- New backend integrations, accuracy benchmarks
To submit a community jargon pack:
- Create
jargon/community/your_domain.yamlfollowing the existing format - Include phonetic variants that STT models commonly produce
- Open a PR
See CONTRIBUTING.md for full guidelines.
Built by Andy Wang.
Core dependencies:
- SenseVoice -- Chinese-optimized STT by Alibaba DAMO Academy
- mlx-whisper -- Whisper on Apple Silicon via MLX
- Silero VAD -- Voice activity detection
- rapidfuzz -- Fuzzy string matching
- rumps -- macOS menubar framework
MIT
