Common questions about Voxtype.
Voxtype is a push-to-talk voice-to-text tool for Linux. Optimized for Wayland, works on X11 too. You hold a hotkey, speak, release the key, and your speech is transcribed and either typed at your cursor position or copied to the clipboard.
Most voice-to-text solutions for Linux either:
- Require internet/cloud services
- Are compositor or desktop-specific
- Don't support CJK (Korean, Chinese, Japanese) characters
Voxtype is designed to:
- Work on any Linux desktop (Wayland or X11)
- Be fully offline (uses local Whisper models)
- Use the push-to-talk paradigm (more predictable than continuous listening)
- Support CJK characters via wtype on Wayland
Yes! Voxtype works on both Wayland and X11. It uses evdev (kernel-level) for hotkey detection, which works everywhere. For text output, it uses wtype on Wayland (with CJK support), with dotool and ydotool as fallbacks.
No. All speech recognition is done locally using whisper.cpp. The only time network access is used is to download Whisper models during initial setup.
All of them! Voxtype is optimized for Wayland compositors with native keybinding support:
- Hyprland, Sway, River - Full push-to-talk via compositor keybindings (no special permissions needed)
- GNOME, KDE Plasma - Works with built-in evdev hotkey (requires
inputgroup) - X11 desktops (i3, etc.) - Works with built-in evdev hotkey (requires
inputgroup)
For text output, Voxtype uses:
- wtype on Wayland (best CJK/Unicode support, no daemon needed)
- dotool as fallback (supports keyboard layouts, no daemon needed)
- ydotool on X11 or as fallback (requires daemon)
- PipeWire (recommended)
- PulseAudio
- ALSA (directly)
Yes, as long as your Bluetooth microphone is recognized by PipeWire/PulseAudio as an audio source.
For type mode: Most applications work. Some may have issues:
- Full-screen games may not receive input
- Some terminal emulators handle pasted input differently
- Electron apps occasionally have issues
For clipboard mode: Works universally (you just need to paste manually).
No. KDE Plasma and GNOME (on Wayland) do not support the virtual keyboard protocol that wtype requires. You'll see the error: "Compositor does not support the virtual keyboard protocol."
Solution: Install dotool (recommended) or use ydotool. Voxtype automatically falls back to dotool, then ydotool, when wtype fails.
For dotool (recommended, supports keyboard layouts):
# Install dotool and add user to input group
sudo usermod -aG input $USER
# Log out and back inFor ydotool (requires daemon):
systemctl --user enable --now ydotoolSee Troubleshooting for complete setup instructions.
Yes! Any key that shows up in evtest can be used. Common choices:
- ScrollLock (default)
- Pause/Break
- Right Alt
- F13-F24 (if your keyboard has them)
Configure in ~/.config/voxtype/config.toml:
[hotkey]
key = "PAUSE"Yes, you can require modifier keys:
[hotkey]
key = "SCROLLLOCK"
modifiers = ["LEFTCTRL"] # Ctrl+ScrollLockYes! Use large-v3 which supports 99 languages:
Transcribe in the spoken language (speak French, output French):
[whisper]
model = "large-v3"
language = "auto"
translate = falseWith GPU acceleration, large-v3 achieves sub-second inference while supporting all languages.
Yes! Speak any language and get English output:
[whisper]
model = "large-v3"
language = "auto"
translate = trueYes, use the transcribe command:
voxtype transcribe recording.wavWhisper automatically adds punctuation based on context. For explicit punctuation, you can speak it (e.g., "period", "comma", "question mark").
Yes! Voxtype supports 10 built-in icon themes plus custom icons. Configure in ~/.config/voxtype/config.toml:
[status]
icon_theme = "nerd-font"Available themes:
| Theme | Description | Requirements |
|---|---|---|
emoji |
🎙️ 🎤 ⏳ (default) | None |
nerd-font |
Nerd Font icons | Nerd Font |
material |
Material Design Icons | MDI Font |
phosphor |
Phosphor Icons | Phosphor Font |
codicons |
VS Code icons | Codicons |
omarchy |
Omarchy distro icons | Omarchy font |
minimal |
○ ● ◐ × | None |
dots |
◯ ⬤ ◔ ◌ | None |
arrows |
▶ ● ↻ ■ | None |
text |
[MIC] [REC] [...] [OFF] | None |
You can also override individual icons or create custom theme files. See the Waybar Integration Guide for complete details.
Most Wayland users don't need this. If you use compositor keybindings (Hyprland, Sway, River), voxtype doesn't need any special permissions.
The input group is only required if you use voxtype's built-in evdev hotkey (e.g., on X11 or GNOME/KDE). The evdev subsystem requires read access to /dev/input/event* devices, which is restricted to the input group for security reasons.
Neither Wayland nor X11 provide a universal way for applications to simulate keyboard input. Voxtype uses a fallback chain:
- wtype on Wayland - uses the virtual-keyboard protocol, supports CJK characters, no daemon needed
- dotool as fallback - uses the kernel's uinput interface, supports keyboard layouts, no daemon needed
- ydotool on X11 (or Wayland fallback) - uses the kernel's uinput interface, requires a daemon
Depends on the Whisper model:
- tiny.en: ~400 MB
- base.en: ~500 MB
- small.en: ~1 GB
- medium.en: ~2.5 GB
- large-v3: ~4 GB
Depends on model and hardware. On a modern CPU:
- tiny.en: ~10x realtime (1 second of speech = 0.1 second to transcribe)
- base.en: ~7x realtime
- small.en: ~4x realtime
- medium.en: ~2x realtime
- large-v3: ~1x realtime
Yes! Voxtype supports optional GPU acceleration:
- Vulkan - Works on AMD, NVIDIA, and Intel GPUs (included in packages)
- CUDA - NVIDIA GPUs (build from source)
- Metal - Apple Silicon (build from source)
- HIP/ROCm - AMD GPUs (build from source)
Vulkan (easiest): Packages include a pre-built Vulkan binary. Install the runtime and enable:
# Install Vulkan runtime (Arch: vulkan-icd-loader, Debian: libvulkan1, Fedora: vulkan-loader)
sudo voxtype setup gpu --enableOther backends: Build from source with cargo build --release --features gpu-cuda (or gpu-metal, gpu-hipblas).
GPU acceleration dramatically improves inference speed, especially for larger models. The large-v3 model can achieve sub-second inference with GPU acceleration.
No. All processing happens locally on your machine. No audio or text is sent to any server.
Using compositor keybindings (recommended):
- Verify your compositor config calls
voxtype record startandvoxtype record stop - Check that voxtype is running:
pgrep voxtype - Test manually:
voxtype record startthenvoxtype record stop
Using built-in evdev hotkey:
- Make sure
enabled = truein your config's[hotkey]section - Verify you're in the
inputgroup:groups | grep input - Log out and back in after adding to the group
- Check the key name with
evtest - Try running with debug:
voxtype -vv
On Wayland:
- Check wtype is installed:
which wtype - Test wtype directly:
wtype "test"
On X11:
- Check ydotool is running:
systemctl --user status ydotool - Test ydotool directly:
ydotool type "test"
Fallback:
Try clipboard mode: voxtype --clipboard
- Use a larger model:
--model small.en - Speak more clearly and at consistent volume
- Reduce background noise
- Use an
.enmodel for English content
- Use a smaller model:
--model tiny.en - Increase thread count in config
- Keep recordings short
See the Troubleshooting Guide for more solutions.
No. Voxtype only records audio while you hold the hotkey. When you release the key, recording stops immediately.
Audio is processed in memory and discarded after transcription. Nothing is saved to disk unless you use the transcribe command on a file.
Voxtype only records while the hotkey is actively held. However, any application with access to your microphone could potentially record audio. Voxtype doesn't add any new attack surface beyond what PipeWire/PulseAudio already provides.
Whisper is highly accurate but not perfect. For sensitive or important content:
- Use a larger model (medium.en or large-v3)
- Review the transcription before using it
- Consider that Whisper may occasionally "hallucinate" text
See the Contributing Guide for details. We welcome:
- Bug reports
- Feature requests
- Code contributions
- Documentation improvements
- Translations
Open an issue at: https://github.com/peteonrails/voxtype/issues
Include:
- Voxtype version
- Linux distribution and version
- Wayland compositor
- Steps to reproduce
- Debug output (
voxtype -vv)
Yes! Open a feature request issue at: https://github.com/peteonrails/voxtype/issues
Describe:
- What you want to accomplish
- Why existing features don't meet your needs
- How you envision it working
I don't accept donations, but if you find Voxtype useful, a star on the GitHub repository would mean a lot and helps others discover the project!
We want to hear from you! Voxtype is a young project and your feedback helps make it better.
- Something not working? If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to issues.