Veery

Dictation that speaks your language. All of them. Entirely on your Mac.

You're a multilingual professional. You switch between Slack and WeChat all day, mixing English jargon into your native language mid-sentence. Every dictation app chokes on it. Veery was built for exactly that — entirely on your Mac, for free.

The Problem

You say "让Claude帮我review一下这个PR，看看API的latency有没有改善" and your dictation app produces garbage. Apple Dictation forces you to pick one language. Wispr Flow and SuperWhisper don't understand that "Sharpe ratio" should stay in English, not become "夏普率". And every tool turns "PyTorch" into "pie torch" because their models have never seen your jargon.

Veery was built because no one else was going to build it.

How It Works

Hold Right Cmd → Speak naturally in any language mix → Release → Text appears

Audio → STT → Jargon Correction → Filler Removal → Paste to active app
        ↓           ↓                    ↓
   SenseVoice    fuzzy+phonetic       strips "um",
   or Whisper    YAML matching        "嗯", "额"

Instant — Text appears the moment you stop speaking
Accurate — Powered by Whisper large-v3-turbo, near-human accuracy across languages
Free forever — No subscription, no usage limits, no account
Private — Runs entirely on your Mac. No audio ever leaves your machine.

Features

Just hold and speak — Hold Right Cmd, talk naturally in any mix of languages, release. Text appears wherever your cursor is.
Your jargon, preserved — "PyTorch" stays "PyTorch", not "pie torch". Fuzzy + phonetic matching catches what STT models get wrong.
Gets smarter as you use it — Edit a dictation or re-dictate, and Veery learns the correction. No manual configuration needed.
Knows your codebase — Run --mine ~/code and Veery extracts class names, constants, imports, and Claude Code custom commands from your projects. Your jargon stays ahead of any model's training data.
Clean output, no editing — Filler words ("um", "嗯", "那个") stripped automatically. What you get is ready to send.
Your data never leaves your Mac — No cloud, no account, no telemetry. Models download once, then everything runs offline forever. Fully open source — read every line of code.

Quick Start

Prerequisites

macOS 14+ (Sonoma) with Apple Silicon (M1/M2/M3/M4)
Python 3.13+
uv package manager
PortAudio (brew install portaudio)

Install

git clone https://github.com/andyhcwang/veery.git
cd veery
bash install.sh   # checks prerequisites, installs deps

Or manually:

git clone https://github.com/andyhcwang/veery.git
cd veery
uv sync

Run

uv run veery

On first launch, Veery will:

Guide you through granting Accessibility, Microphone, and Input Monitoring permissions
Download STT models (~200MB for SenseVoice, ~1.5GB for Whisper) with a progress bar -- you can start dictating with SenseVoice while Whisper downloads in the background
Show a microphone icon in your menubar when ready

Note: First-time setup requires an internet connection to download models. After that, Veery works fully offline.

Hold Right Cmd, speak in whatever mix of languages comes naturally, release, and watch the text appear.

Build a Dev `.app` (Apple Silicon)

Veery includes a py2app target for an arm64-only menubar development app bundle:

uv sync --group app
./.venv/bin/python setup.py py2app -A

The app lands at dist/Veery.app and runs against your live checkout.

Notes:

The bundle is configured as an agent app (LSUIElement) so it stays in the menubar without a Dock icon.
NSMicrophoneUsageDescription is embedded in the app bundle plist for macOS microphone permission prompts.
This is a developer/testing wrapper around the current source tree, not a fully standalone distributable app.

The Jargon System

STT models are trained on general speech — they don't know your domain vocabulary. Veery fixes this with a three-layer correction system that runs in <1ms:

1. Exact matching

# jargon/tech.yaml
terms:
  PyTorch:
    - pie torch
    - py torch
  DuckDB:
    - duck dee bee
    - duck DB
  Sharpe ratio:
    - sharp ratio
    - sharp issue

When the STT outputs "pie torch", Veery instantly corrects it to "PyTorch".

2. Fuzzy matching

Even if the STT output doesn't exactly match a variant, fuzzy matching (via rapidfuzz) catches close approximations. "pytorche" still matches "PyTorch". Threshold is tunable (default: 82/100).

3. Phonetic matching

For terms where the STT gets the sounds right but the spelling wrong, consonant-skeleton matching catches them. "NumPi" and "NumPy" share the same skeleton nmp, so Veery knows they're the same term.

Adding your own terms

Create a YAML file in jargon/, add your terms with STT variants, and reference it in config.yaml. Or open jargon files directly from the Veery menubar under Jargon Dictionaries. See CONTRIBUTING.md for the full format.

Community jargon packs

Pre-built packs ship for common domains: ai_ml.yaml, devops_cloud.yaml, frontend.yaml. Add them to your config.yaml under jargon.dict_paths. Want to contribute a pack for your domain? See Contributing.

Configuration

Veery works out of the box with sensible defaults. To customize, edit config.yaml in the project root:

# config.yaml
stt:
  backend: whisper          # "sensevoice" or "whisper" (default: whisper)

audio:
  max_duration_sec: 30.0    # Auto-stop after 30s

vad:
  threshold: 0.4            # Speech detection sensitivity (lower = more sensitive)
  silence_duration_sec: 2.0 # Seconds of silence before auto-stop

hotkey:
  key_combo: right_cmd      # Push-to-talk key
  mode: hold                # "hold" (push-to-talk) or "toggle" (press-to-toggle)

jargon:
  dict_paths:
    - jargon/quant_finance.yaml
    - jargon/tech.yaml
    - jargon/claude_code.yaml     # Claude Code slash commands
    - jargon/mined.yaml           # Auto-generated by --mine
    - jargon/mined_commands.yaml  # Auto-generated by --mine
  fuzzy_threshold: 82       # Fuzzy match sensitivity (0-100)

output:
  cgevent_char_limit: 500   # Text shorter than this is typed character-by-character;
                            # longer text is pasted via clipboard

learning:
  enabled: true
  promotion_threshold: 3    # Corrections needed before auto-adding to dictionary

Comparison

	Veery	SuperWhisper	Wispr Flow	Apple Dictation
Price	Free forever	$8.49/mo	$10/mo	Free
Privacy	100% local	Local	Local + cloud options	Cloud
Multilingual	Purpose-built	Multi-lang (generic)	Multi-lang (generic)	Single language only
Jargon handling	Fuzzy + phonetic + auto-learn	Find-and-replace	Auto-learn	None
Chinese STT	SenseVoice (SOTA Chinese)	Whisper (English-first)	Proprietary	Apple ASR
Custom dictionaries	YAML (open, editable)	Vocabulary hints	Manual add	None
Codebase mining	Yes (`--mine`)	No	No	No
Open source	Yes	No	No	No

Veery doesn't compete on polish or mobile support. It wins on the axis that matters to you: mixed-language technical speech with domain jargon.

Contributing

We welcome contributions, especially:

Jargon packs -- Add terms for your domain (biotech, crypto, game dev, etc.)
Bug reports -- Open an issue with your STT output and expected correction
STT improvements -- New backend integrations, accuracy benchmarks

To submit a community jargon pack:

Create jargon/community/your_domain.yaml following the existing format
Include phonetic variants that STT models commonly produce
Open a PR

See CONTRIBUTING.md for full guidelines.

Credits

Built by Andy Wang.

Core dependencies:

SenseVoice -- Chinese-optimized STT by Alibaba DAMO Academy
mlx-whisper -- Whisper on Apple Silicon via MLX
Silero VAD -- Voice activity detection
rapidfuzz -- Fuzzy string matching
rumps -- macOS menubar framework

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
jargon		jargon
src/veery		src/veery
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
demo.gif		demo.gif
install.sh		install.sh
macos_app.py		macos_app.py
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Veery

Table of Contents

The Problem

How It Works

Features

Quick Start

Prerequisites

Install

Run

Build a Dev `.app` (Apple Silicon)

The Jargon System

1. Exact matching

2. Fuzzy matching

3. Phonetic matching

Adding your own terms

Community jargon packs

Configuration

Comparison

Contributing

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Veery

Table of Contents

The Problem

How It Works

Features

Quick Start

Prerequisites

Install

Run

Build a Dev .app (Apple Silicon)

The Jargon System

1. Exact matching

2. Fuzzy matching

3. Phonetic matching

Adding your own terms

Community jargon packs

Configuration

Comparison

Contributing

Credits

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Build a Dev `.app` (Apple Silicon)

Packages