VOXD - Voice-Type / dictation app for Linux 🗣️⌨️

Running in background, provides fast voice-to-text typing in any Linux app.
Using LOCAL (offline) voice processing, with optional LOCAL (offline) AI text post-processing.
Runs fine even on older CPUs. GPU acceleration is optional but can significantly improve transcription speed.

Hit your hotkey shortcut -> speak -> hotkey again -> watch your words appear wherever the cursor currently is, even AI-rewritten as a poem or a C++ code.

Tested & Works on:

Arch / Hyprland
Omarchy 3.0
Ubuntu 24.04 / GNOME
Ubuntu 25.04 / Sway
Fedora 42 / KDE
Pop!_OS 22 / COSMIC
Mint 22 / Cinnamon
openSUSE / Leap 15.6

Highlights

Feature	Notes
Whisper.cpp backend	Local, offline, fast ASR.
Streaming transcription	Real-time incremental typing as you speak. Text appears word-by-word, not after recording stops.
Simulated typing	Instantly types straight into any currently focused input window. Even on Wayland! (ydotool).
Clipboard	Auto-copies into clipboard - ready for pasting, if desired
Languages	99+ languages. Provides default language config and session language override
AIPP, AI Post-Processing	AI-rewriting via local or cloud LLMs. GUI prompt editor.
Multiple UI surfaces	CLI, GUI (minimal PyQt6), TRAY (system tray), FLUX (triggered by voice activity detection, beta)
Logging & performance	Session log plus your own optional local performance data (CSV).

Setup

Complete the 2 steps:

Install VOXD
setup a hotkey.

1. Install VOXD

Install from Release (recommended)

Download the package for your distro and architecture from the latest release, then install with your package manager.

Latest builds: GitHub Releases (Latest)

Ubuntu / Debian (.deb)

# Update package lists and install the downloaded .deb package:
sudo apt update
sudo apt install -y ./voxd_*_amd64.deb    # or ./voxd_*_arm64.deb on ARM systems

Fedora (.rpm)

# Update repositories and install the downloaded .rpm package:
sudo dnf update -y
sudo dnf install -y ./voxd-*-x86_64.rpm  # or the arm64 counterpart if on an ARM device

Arch Linux (.pkg.tar.zst)

# Synchronize package databases and install the downloaded .pkg.tar.zst package:
sudo pacman -Sy
sudo pacman -U ./voxd-*-x86_64.pkg.tar.zst    # or the arm64 counterpart if on an ARM device

openSUSE (.rpm)

# Refresh repositories and install the downloaded .rpm package with dependency resolution:
sudo zypper refresh
sudo zypper install --force-resolution ./voxd-*-x86_64.rpm   # or the arm64 counterpart if on an ARM device

Alternatively: Download the source or clone the repo, and run the setup:

git clone https://github.com/jakovius/voxd.git

cd voxd && ./setup.sh

# requires sudo for packages & REBOOT (ydotool setup on Wayland systems). Launchers (GUI, Tray, Flux) are installed automatically.

Setup is non-interactive with minimal console output; a detailed setup log is saved in the repo directory (e.g. 2025-09-18-setup-log.txt).

Note: The setup script automatically detects and uses the highest available Python 3.x version (3.9 or newer) on your system. If you have Python 3.14, 3.13, or any newer version installed, it will be used automatically.

Reboot the system!
(unless on an X11 system; on most modern systems there is Wayland, so ydotool is required for typing and needs rebooting for user setup).

GPU Acceleration (Optional)

VOXD automatically detects available GPU acceleration backends (OpenCL, CUDA, or Vulkan) and will build whisper.cpp with GPU support if a compatible GPU is found. This can significantly speed up transcription, especially on systems with integrated or dedicated GPUs.

To force a GPU-enabled build from source (bypassing prebuilt binaries that may lack GPU support):

VOXD_FORCE_GPU_REBUILD=1 ./setup.sh

This will:

Skip prebuilt binaries and build whisper.cpp from source
Enable GPU acceleration based on detected backend (OpenCL for Intel/AMD iGPUs, CUDA for NVIDIA GPUs, or Vulkan as fallback)
Install necessary development packages (e.g., opencl-headers for OpenCL)

Note: The setup script automatically detects GPU capabilities and installs required development packages. If GPU acceleration is detected, you'll see a message indicating which backend will be used (e.g., "GPU acceleration: OpenCL detected – will build with OpenCL support").

2. Setup a global hotkey shortcut in your system, for recording/stop:

a. Open your system keyboard-shortcuts panel:

GNOME: Settings → Keyboard → "Custom Shortcuts"
KDE / XFCE / Cinnamon: similar path.
Hyprland / Sway: just add a keybinding in the respective config file.

b. The command to assign to the shortcut hotkey (EXACTLY as given):

bash -c 'voxd --trigger-record'

c. Click Add / Save.

First, run the app in terminal via just
voxd or voxd --setup command.
The first run will do some initial setup (voice model, LLM model for AIPP, ydotool user setup).

READY! → Go type anywhere with your voice!

Usage

Use the installed VOXD launchers (your app launcher) or launch via Terminal, in any mode:

voxd        # CLI (interactive); 'h' shows commands inside CLI. FIRST RUN: a necessary initial setup.
voxd --rh   # directly starts hotkey-controlled continuous recording in Terminal
voxd -h     # show top-level help and quick-actions
voxd --gui  # friendly GUI window--just leave it in the background to voice-type via your hotkey
voxd --tray # sits in the tray; perfect for unobstructed dictation (hotkey-driven also)
voxd --flux # VAD (Voice Activity Detection), voice-triggered continuous dictation (in beta)

Leave VOXD running in the background -> go to any app where you want to voice-type and:

Press hotkey …	VOXD does …
First press	start recording
Second press	stop ⇢ [finalize transcription ⇢ copy to clipboard] ⇢ types any remaining output into any focused app

🎙️ Streaming Mode (Default)

VOXD uses streaming transcription by default, which means:

Real-time typing: Text appears incrementally as you speak, not after you stop recording
Chunk-based processing: Audio is processed in overlapping chunks (default: 3 seconds) for continuous transcription
Incremental updates: Text is typed word-by-word or phrase-by-phrase as it's transcribed (typically every 2 seconds or 3 words)
Seamless experience: You see your words appear in real-time, making it feel like natural voice-typing

How it works:

Press hotkey to start → VOXD begins recording and transcribing
As you speak → Text appears incrementally in your focused application
Press hotkey again → Finalizes any remaining transcription and copies to clipboard

This streaming behavior is enabled by default in CLI (voxd), GUI (voxd --gui), and Tray (voxd --tray) modes. The old "record-then-transcribe" behavior is no longer used.

Note: If in --flux mode (beta), just speak - no hotkey needed, voice activity detection triggers recording automatically.

Autostart

For practical reasons (always ready to type & low system footprint), it is advised to enable voxd user daemon:

Enable: voxd --autostart true
Disable: voxd --autostart false

This launches voxd --tray automatically after user login using systemd user services when available; otherwise it falls back to an XDG Autostart entry (~/.config/autostart/voxd-tray.desktop).

Languages

Supported codes: ISO 639-1 (e.g., en, es, de, sv) and auto (auto-detect, not advised).
Default: en. You can override per run or persist it.
Change via CLI (session-only), examples:

voxd --gui  --lang auto
voxd --tray --lang es
voxd --flux --lang de
voxd --rh   --lang sv

Persist via CLI: voxd --cfg opens the config file for editing. Set:

# ~/.config/voxd/config.yaml
language: sv  # or 'auto', 'es', etc.

Change via GUI/Tray (persisted): Menu → Language. Saved to ~/.config/voxd/config.yaml as language.
Model note: For non‑English languages, use a multilingual Whisper model (not *.en.bin). Install/switch via GUI “Whisper Models” or voxd-model (e.g., ggml-base.bin, small, medium, large-v3).
Tip: auto works well, but setting the exact language can improve accuracy. If you pick a non‑English language while using an English‑only model, VOXD will warn and transcription quality may drop.

🎙️ Managing speech models

VOXD needs a Whisper GGML model file. There is one default model readily setup in the app (base.en).
Use the built-in model-manager in GUI mode or via CLI mode in Terminal to fetch any other model.
The voice models are downloaded into ~/.local/share/voxd/models/ and VOXD app will automatically have them visible.

CLI model management examples:

voxd-model list	# show models already on disk
voxd-model install tiny.en  #	download another model
voxd-model --no-check install base.en # download a model and skip SHA-1 verification
voxd-model remove tiny.en	# delete a model
voxd-model use tiny.en	# make that model the default (edits config.yaml)

Models for download (size MB):

Model	Size (MB)	Filename
tiny	75	ggml-tiny.bin
tiny-q5_1	31	ggml-tiny-q5_1.bin
tiny-q8_0	42	ggml-tiny-q8_0.bin
tiny.en	75	ggml-tiny.en.bin
tiny.en-q5_1	31	ggml-tiny.en-q5_1.bin
tiny.en-q8_0	42	ggml-tiny.en-q8_0.bin
base	142	ggml-base.bin
base-q5_1	57	ggml-base-q5_1.bin
base-q8_0	78	ggml-base-q8_0.bin
base.en	142	ggml-base.en.bin
base.en-q5_1	57	ggml-base.en-q5_1.bin
base.en-q8_0	78	ggml-base.en-q8_0.bin
small	466	ggml-small.bin
small-q5_1	181	ggml-small-q5_1.bin
small-q8_0	252	ggml-small-q8_0.bin
small.en	466	ggml-small.en.bin
small.en-q5_1	181	ggml-small.en-q5_1.bin
small.en-q8_0	252	ggml-small.en-q8_0.bin
small.en-tdrz	465	ggml-small.en-tdrz.bin
medium	1500	ggml-medium.bin
medium-q5_0	514	ggml-medium-q5_0.bin
medium-q8_0	785	ggml-medium-q8_0.bin
medium.en	1500	ggml-medium.en.bin
medium.en-q5_0	514	ggml-medium.en-q5_0.bin
medium.en-q8_0	785	ggml-medium.en-q8_0.bin
large-v1	2900	ggml-large-v1.bin
large-v2	2900	ggml-large-v2.bin
large-v2-q5_0	1100	ggml-large-v2-q5_0.bin
large-v2-q8_0	1500	ggml-large-v2-q8_0.bin
large-v3	2900	ggml-large-v3.bin
large-v3-q5_0	1100	ggml-large-v3-q5_0.bin
large-v3-turbo	1500	ggml-large-v3-turbo.bin
large-v3-turbo-q5_0	547	ggml-large-v3-turbo-q5_0.bin
large-v3-turbo-q8_0	834	ggml-large-v3-turbo-q8_0.bin

⚙️ User Config

Available in GUI and TRAY modes ("Settings"), but directly here: ~/.config/voxd/config.yaml

🧠 AI Post-Processing (AIPP)

Your spoken words can be magically cleaned and rendered into e.g. neatly formated email, a poem, or straight away into a programing code!

VOXD can optionally post-process your transcripts using LOCAL (on-machine, llama.cpp, Ollama) or cloud LLMs (like OpenAI, Anthropic, or xAI).
For the local AIPP, llama.cpp is available out-of-the-box, with a default model.
You can also install Ollama and download a model that can be run on your machine, e.g. ollama pull gemma3:latest.
You can enable, configure, and manage prompts directly from the GUI.

Enable AIPP:

In CLI mode, use --aipp argument.
In GUI or TRAY mode, all relevant settings are in: "AI Post-Processing".
Seleting provider & model - models are tied to their respective providers!
Editing Prompts - Select "Manage prompts" or "Prompts" to edit up to 4 of them.

Supported providers:

llama.cpp (local)
Ollama (local)
OpenAI
Anthropic
xAI

AIPP Model Management

Model Storage

~/.local/share/voxd/llamacpp_models/

Adding Models | Requirements

GGUF format only (.gguf extension)
Quantized models recommended (Q4_0, Q4_1, Q5_0, etc.)
❌ Not supported: PyTorch (.pth), Safetensors (.safetensors), ONNX

Step 1: Download a .gguf model from Hugging Face

# Example: Download to model directory
cd ~/.local/share/voxd/llamacpp_models/
wget https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q4_k_m.gguf?download=true

Step 2: Restart VOXD
VOXD automatically discovers all .gguf files in the models directory on startup and makes them available for selection.

Step 3: Select in VOXD GUI
AI Post-Processing → Provider: llamacpp_server → Model: qwen2.5-3b-instruct

Recommended Models for AIPP

Model	Size	RAM	Quality	Best For
qwen2.5-3b-instruct	1.9GB	3GB	Great	Default, high quality
qwen2.5-coder-1.5b	900MB	2GB	Good	Code-focused tasks

🔧 Advanced Configuration

Edit ~/.config/voxd/config.yaml:

# llama.cpp settings
llamacpp_server_path: "llama.cpp/build/bin/llama-server"
llamacpp_server_url: "http://localhost:8080"
llamacpp_server_timeout: 30

# Selected models per provider (automatically updated by VOXD)
aipp_selected_models:
  llamacpp_server: "qwen2.5-3b-instruct-q4_k_m"

# Streaming transcription settings (default: enabled)
streaming_enabled: true              # Enable/disable streaming mode
streaming_chunk_seconds: 3.0         # Audio chunk size in seconds (default: 3.0)
streaming_overlap_seconds: 0.5      # Overlap between chunks in seconds (default: 0.5)
streaming_emit_interval_seconds: 2.0 # Minimum time between text updates (default: 2.0)
streaming_emit_word_count: 3         # Minimum words before emitting text (default: 3)
streaming_typing_delay: 0.01         # Delay between typed characters in streaming mode (default: 0.01)
streaming_min_chars_to_type: 3      # Minimum characters before typing incremental text (default: 3)

🔑 Setting API Keys for the remote API providers

For security reasons, be mindful where you store your API keys.
To use cloud AI providers, set the required API key(s) in your shell environment before running VOXD.
For example, add these lines to your .bashrc, .zshrc, or equivalent shell profile for convenience (change to your exact key accordingly):

# For OpenAI
export OPENAI_API_KEY="sk-..."

# For Anthropic
export ANTHROPIC_API_KEY="..."

# For xAI
export XAI_API_KEY="..."

Note:
If an API key is missing, the respective cloud-based AIPP provider will (surprise, surprise) not work.

🩺 Troubleshooting cheatsheet

Note: As one may expect, the app is not completely immune to very noisy environments :) especially if you are not the best speaker out there.

Symptom	Likely cause / fix
Getting randomly [BLANK_AUDIO], no transcript, or very poor transcript	Most likely: too high mic volume (clipping & distortions) VOXD will try to set your microphone optimally (configurable), but anyway check if input volume is not > 45%.
Press hotkey, nothing happens	Troubleshoot with this command: `gnome-terminal -- bash -c "voxd --trigger-record; read -p 'Press Enter...'"`
Transcript printed but not typed	Wayland: `ydotool` not installed or user not in `input` group → run `setup_ydotool.sh`, relog.
"whisper-cli not found"	Build failed - rerun `./setup.sh` and check any diagnostic output.
GPU acceleration not working	Prebuilt binaries may lack GPU support. Force rebuild with GPU: `VOXD_FORCE_GPU_REBUILD=1 ./setup.sh`. Check setup log for GPU detection messages.
Mic not recording	Verify in system settings: input device available? / active? / not muted?
Clipboard empty	ensure `xclip` or `wl-copy` present (re-run `setup.sh`).

Audio troubleshooting

List devices: python -m sounddevice (check that a device named "pulse" exists on modern systems).
Prefer PulseAudio/PipeWire: set in ~/.config/voxd/config.yaml:

audio_prefer_pulse: true
audio_input_device: "pulse"   # or a specific device name or index

If no pulse device:
- Debian/Ubuntu: sudo apt install alsa-plugins pavucontrol (ensure pulseaudio or pipewire-pulse is active)
- Fedora/openSUSE: sudo dnf install alsa-plugins-pulseaudio pavucontrol (ensure pipewire-pulseaudio is active)
- Arch: sudo pacman -S alsa-plugins pipewire-pulse pavucontrol
If 16 kHz fails on ALSA: VOXD will retry with the device default rate and with pulse when available.

📜 License & Credits

VOXD – © 2025 Jakov Ivkovic – MIT license (see LICENSE). Logo and brand assets: see ASSETS_LICENSE. Trademarks: see TRADEMARKS.md.
Speech engine powered by ggml-org/whisper.cpp (MIT) and OpenAI Whisper models (MIT).
Auto-typing/pasting powered by ReimuNotMoe/ydotool (AGPLv3).
Transcript post-processing powered by ggml-org/llama.cpp (MIT)

🗑️ Removal / Uninstall

1. Package install (deb/rpm/arch)

If VOXD was installed via a native package:

Ubuntu/Debian

sudo apt remove voxd

Fedora

sudo dnf remove -y voxd

openSUSE

sudo zypper --non-interactive remove voxd

Arch

sudo pacman -R voxd

Note: This removes system files (e.g., under /opt/voxd and /usr/bin/voxd). User-level data (models, config, logs) remain. See "Optional runtime clean-up" below to remove those.

2. Repo-clone install (`./setup.sh`)

If you cloned this repository and ran ./setup.sh inside it, just run the uninstall.sh script in the repo folder:

# From inside the repo folder
./uninstall.sh

3. pipx install

If voxd was installed through pipx (either directly or via the prompt at the end of setup.sh):

pipx uninstall voxd

Enjoy seamless voice-typing on Linux - and if you build something cool on top, open a PR or say hi!

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
.github/workflows		.github/workflows
.vscode		.vscode
THIRD_PARTY_LICENSES		THIRD_PARTY_LICENSES
packaging		packaging
src/voxd		src/voxd
tests		tests
.gitignore		.gitignore
ASSETS_LICENSE		ASSETS_LICENSE
LICENSE		LICENSE
README.md		README.md
TRADEMARKS.md		TRADEMARKS.md
check_gpu.sh		check_gpu.sh
pyproject.toml		pyproject.toml
setup.sh		setup.sh
test_whisper_perf.sh		test_whisper_perf.sh
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

VOXD - Voice-Type / dictation app for Linux 🗣️⌨️

Highlights

Setup

1. Install VOXD

Install from Release (recommended)

Ubuntu / Debian (.deb)

Fedora (.rpm)

Arch Linux (.pkg.tar.zst)

openSUSE (.rpm)

Alternatively: Download the source or clone the repo, and run the setup:

GPU Acceleration (Optional)

2. Setup a global hotkey shortcut in your system, for recording/stop:

READY! → Go type anywhere with your voice!

Usage

Use the installed VOXD launchers (your app launcher) or launch via Terminal, in any mode:

🎙️ Streaming Mode (Default)

Autostart

Languages

🎙️ Managing speech models

⚙️ User Config

🧠 AI Post-Processing (AIPP)

Enable AIPP:

Supported providers:

AIPP Model Management

Model Storage

Adding Models | Requirements

Recommended Models for AIPP

🔧 Advanced Configuration

🔑 Setting API Keys for the remote API providers

🩺 Troubleshooting cheatsheet

Audio troubleshooting

📜 License & Credits

🗑️ Removal / Uninstall

1. Package install (deb/rpm/arch)

2. Repo-clone install (./setup.sh)

3. pipx install

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Repo-clone install (`./setup.sh`)

Packages