Solutions to common issues when using Voxtype.
- Modifier Key Interference (Hyprland/Sway/River)
- Permission Issues
- Audio Problems
- Transcription Issues
- Output Problems
- Performance Issues
- Systemd Service Issues
- Debug Mode
Symptoms: When using compositor keybindings with modifiers (e.g., SUPER+CTRL+X or SUPER+O), releasing keys slowly causes typed output to trigger shortcuts instead of inserting text. For example, if you release X while still holding SUPER, the transcribed "hello" might trigger SUPER+h, SUPER+e, SUPER+l, etc.
Cause: The compositor tracks physical keyboard state. Even though voxtype types text, if you're still physically holding a modifier key, the compositor combines them.
Solution: Use the compositor setup command to automatically install a fix that blocks modifier keys during text output.
For Hyprland:
voxtype setup compositor hyprland
hyprctl reload
systemctl --user restart voxtypeFor Sway:
voxtype setup compositor sway
swaymsg reload
systemctl --user restart voxtypeFor River:
voxtype setup compositor river
# Restart River or source the new config
systemctl --user restart voxtypeNote: This command does NOT set up keybindings—it only installs the modifier interference fix. See the User Manual to set up your push-to-talk hotkey.
This command:
- Writes a modifier-blocking submap/mode to
~/.config/hypr/conf.d/voxtype-submap.conf(orsway/conf.d/voxtype-mode.conf, orriver/conf.d/voxtype-mode.sh) - Adds pre/post output hooks to your voxtype config
- Checks that your compositor config sources the conf.d directory
If voxtype crashes while typing, press F12 to escape the submap and restore normal modifier behavior.
Manual setup: See voxtype setup compositor hyprland --show for the full configuration if you prefer to set it up manually.
Alternative workaround: If you can't use submaps, a simple delay before typing may help:
[output.post_process]
command = "sleep 0.3 && cat"
timeout_ms = 5000The automatic fix (voxtype setup compositor) only works on compositors that support input modes or submaps:
| Compositor | Supported | Why |
|---|---|---|
| Hyprland | Yes | Has submaps |
| Sway | Yes | Has modes |
| River | Yes | Has modes |
| Qtile | No | No mode/submap concept |
| Niri | No | No mode/submap concept |
| GNOME | No | No mode/submap concept |
| KDE | No | No mode/submap concept |
For unsupported compositors, use one of these alternatives:
-
Use a dedicated key without modifiers - Keys like ScrollLock, Pause, or F13-F24 don't have this problem since there are no modifiers to interfere:
[hotkey] key = "SCROLLLOCK"
-
Use the post-processor delay (works on any compositor):
[output.post_process] command = "sleep 0.3 && cat" timeout_ms = 5000
This gives you 300ms to release all keys before typing starts.
-
Use voxtype's built-in evdev hotkey instead of compositor keybindings - release timing doesn't matter since voxtype controls the entire recording flow.
Cause: User is not in the input group, required for evdev access.
Solution:
# Add user to input group
sudo usermod -aG input $USER
# IMPORTANT: Log out and back in for changes to take effect
# Verify membership
groups | grep inputCause: udev rules preventing access, or input group not applied.
Solution:
- Verify group membership:
groups | grep input - If not shown, log out and back in completely
- If still failing, check udev rules:
ls -la /dev/input/event*
# Should show group 'input' with rw permissionsCause: uinput module not loaded or wrong permissions.
Solution:
# Load uinput module
sudo modprobe uinput
# Make it persistent
echo "uinput" | sudo tee /etc/modules-load.d/uinput.conf
# Check ydotool daemon
systemctl --user status ydotoolPossible causes and solutions:
# List available audio sources
pactl list sources short
# Test recording with system default
arecord -d 3 -f S16_LE -r 16000 test.wav
aplay test.wavIf test recording works, check your Voxtype config:
[audio]
device = "default" # Or specific device name from pactl# Check PulseAudio/PipeWire volume
pavucontrol
# Or
pactl list sources | grep -A 10 "Default"# Check audio server status
pactl info
# Restart if needed
systemctl --user restart pipewire pipewire-pulse
# Or for PulseAudio:
systemctl --user restart pulseaudioCause: Audio device doesn't support 16kHz sample rate.
Solution: Voxtype handles resampling internally, but ensure your device works:
# Test at native rate
arecord -d 2 test.wav
aplay test.wavCause: max_duration_secs limit reached.
Solution: Increase the limit:
[audio]
max_duration_secs = 120 # 2 minutesCause: Whisper model not downloaded or wrong path.
Solution:
# Download the model
voxtype setup --download
# Or manually download
mkdir -p ~/.local/share/voxtype/models
curl -L -o ~/.local/share/voxtype/models/ggml-base.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.binCause: On some systems (particularly with glibc 2.42+ like Ubuntu 25.10), the whisper-rs FFI bindings crash due to C++ exceptions crossing the FFI boundary.
Solution: Use the CLI backend which runs whisper-cli as a subprocess:
[whisper]
backend = "cli"This requires whisper-cli to be installed. Build it from whisper.cpp:
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release
sudo cp build/bin/whisper-cli /usr/local/bin/See CLI Backend in the User Manual for details.
Possible causes:
- For English: Use
.enmodels (e.g.,base.en) - For accuracy: Use larger models (
small.en,medium.en)
- Use a quality microphone
- Reduce background noise
- Maintain consistent distance from mic
[whisper]
model = "base.en" # For English
language = "en"Context window optimization is disabled by default because it can cause phrase repetition with some models (especially large-v3 and large-v3-turbo).
If you want faster transcription and your model works well with it, you can enable it:
[whisper]
context_window_optimization = trueOr via command line:
voxtype --whisper-context-optimization daemonIf you experience phrase repetition (e.g., "word word word"), make sure this setting is disabled (the default).
Cause: Recording contains mostly silence.
Solution:
- Check microphone is working
- Increase microphone sensitivity
- Speak closer to the microphone
Cause: Known Whisper behavior with silence or noise.
Solutions:
- Use a larger model for better accuracy
- Avoid recording ambient noise
- Keep recordings short and speech-focused
Cause: Known issue with Whisper large-v3 models, especially when context window optimization is enabled.
Example: Saying "increase the limit" produces "increase the limit increase the limit increase the limit"
Solutions:
- Ensure
context_window_optimizationis disabled (the default):[whisper] context_window_optimization = false
- Try a different model (large-v3-turbo and large-v3 are most affected)
- If using context optimization and experiencing issues, disable it
Symptom: wtype fails with "Compositor does not support the virtual keyboard protocol"
Cause: KDE Plasma and GNOME do not implement the zwp_virtual_keyboard_v1 Wayland protocol that wtype requires. This is a compositor limitation, not a voxtype bug.
What happens: Voxtype detects this failure and automatically falls back to dotool, then ydotool. If neither is set up, it falls back to clipboard mode.
Solution 1 (Recommended): Install dotool. Unlike ydotool, dotool does not require a daemon and supports keyboard layouts for non-US keyboards:
# 1. Install dotool (check your distribution's package manager)
# Arch (AUR):
yay -S dotool
# From source: https://sr.ht/~geb/dotool/
# 2. Add user to input group (required for uinput access)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect
# 3. Configure keyboard layout if needed (non-US keyboards)
# Add to config.toml:
# [output]
# dotool_xkb_layout = "de" # German, French ("fr"), etc.Solution 2: Set up ydotool as your typing backend. Unlike dotool, ydotool requires a daemon to be running:
# 1. Install ydotool
# Arch:
sudo pacman -S ydotool
# Fedora:
sudo dnf install ydotool
# Ubuntu/Debian:
sudo apt install ydotool
# 2. Enable and start the daemon (required!)
systemctl --user enable --now ydotool
# 3. Verify it's running
systemctl --user status ydotoolImportant: For ydotool, simply having it installed is not enough. The daemon must be running for the fallback to work.
Alternative: Use clipboard or paste mode instead of type mode:
[output]
mode = "clipboard" # Copies to clipboard, you paste manually
# or
mode = "paste" # Copies to clipboard, then simulates Ctrl+VCompositor compatibility:
| Desktop | wtype | dotool | ydotool | Recommended |
|---|---|---|---|---|
| Hyprland | ✓ | ✓ | ✓ | wtype |
| Sway | ✓ | ✓ | ✓ | wtype |
| River | ✓ | ✓ | ✓ | wtype |
| KDE Plasma (Wayland) | ✗ | ✓ | ✓ | dotool |
| GNOME (Wayland) | ✗ | ✓ | ✓ | dotool |
| X11 (any desktop) | ✗ | ✓ | ✓ | dotool |
Symptom: You're running X11 (not Wayland) and see errors like:
WARN wtype failed: Wayland connection failed
WARN clipboard (wl-copy) failed: Text injection failed
ERROR Output failed: All output methods failed.
Cause: wtype and wl-copy are Wayland-only tools. On X11, voxtype needs dotool, ydotool, or xclip installed.
Solution: Install one of these X11-compatible tools:
Option 1 (Recommended): Install dotool
dotool works on X11, supports keyboard layouts, and doesn't need a daemon:
# Ubuntu/Debian (from source):
sudo apt install libxkbcommon-dev
git clone https://git.sr.ht/~geb/dotool
cd dotool && ./build.sh && sudo cp dotool /usr/local/bin/
# Arch (AUR):
yay -S dotool
# Add user to input group
sudo usermod -aG input $USER
# Log out and back inOption 2: Install ydotool
ydotool works on X11 but requires a running daemon:
# Ubuntu/Debian:
sudo apt install ydotool
# Start the daemon (see "ydotool daemon not running" section for Fedora)
systemctl --user enable --now ydotoolOption 3: Use clipboard mode with xclip
For clipboard-only output (you paste manually with Ctrl+V):
# Ubuntu/Debian:
sudo apt install xclipThen configure voxtype to use clipboard mode:
[output]
mode = "clipboard"Verify your setup:
voxtype setupThis shows which output tools are installed and available.
Symptom: Transcribed text has wrong characters. For example, on a German keyboard, "Python" becomes "Pzthon" and "zebra" becomes "yebra" (y and z are swapped).
Cause: ydotool sends raw US keycodes and doesn't support keyboard layouts. When voxtype falls back to ydotool (e.g., on X11, Cinnamon, or when wtype fails), characters are typed as if you had a US keyboard layout.
Solution: Install dotool and configure your keyboard layout. Unlike ydotool, dotool supports keyboard layouts via XKB:
# 1. Install dotool
# Arch (AUR):
yay -S dotool
# Ubuntu/Debian (from source):
# See https://sr.ht/~geb/dotool/ for instructions
# Fedora (from source):
# See https://sr.ht/~geb/dotool/ for instructions
# 2. Add user to input group (required for uinput access)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect
# 3. Configure your keyboard layout in config.toml:Add to ~/.config/voxtype/config.toml:
[output]
dotool_xkb_layout = "de" # German QWERTZCommon layout codes:
de- German (QWERTZ)fr- French (AZERTY)es- Spanishuk- Ukrainianru- Russianpl- Polishit- Italianpt- Portuguese
For layout variants (e.g., German without dead keys):
[output]
dotool_xkb_layout = "de"
dotool_xkb_variant = "nodeadkeys"Alternative: Use paste mode, which copies text to the clipboard and simulates Ctrl+V. This works regardless of keyboard layout:
[output]
mode = "paste"Note: The keyboard layout fix requires voxtype v0.5.0 or later. If you're on an older version, upgrade first.
Cause: ydotool systemd service not started, or configured incorrectly for your distribution.
Solution: The setup varies by distribution:
Arch provides a user-level service that runs in your session:
# Enable and start ydotool as a user service
systemctl --user enable --now ydotool
# Verify it's running
systemctl --user status ydotoolFedora provides a system-level service that requires additional configuration to work with your user:
# 1. Enable and start the system service
sudo systemctl enable --now ydotool
# 2. Edit the service to allow your user to access the socket
sudo systemctl edit ydotoolAdd this content (replace 1000 with your user/group ID from id -u and id -g):
[Service]
ExecStart=
ExecStart=/usr/bin/ydotoold --socket-path="/run/user/1000/.ydotool_socket" --socket-own="1000:1000"Then restart:
sudo systemctl restart ydotool
# Verify it's running
systemctl status ydotoolCheck which service type is available:
# Check for user service
systemctl --user status ydotool
# If not found, check for system service
systemctl status ydotoolIf only a system service exists, follow the Fedora instructions above.
# Test that ydotool can type
ydotool type "test"If you see "Failed to connect to socket", the daemon isn't running or the socket permissions are wrong.
Possible causes:
# Test ydotool directly
ydotool type "test"# Test wl-copy
echo "test" | wl-copy
wl-pasteSome applications (terminals, games) may block simulated input.
Solution: Use clipboard mode:
[output]
mode = "clipboard"Cause: Typing too fast for the application.
Solution: Increase typing delay:
[output]
type_delay_ms = 10 # Try 10-50msCause: wl-copy not installed or Wayland session issue.
Solution:
# Install wl-clipboard
# Arch: sudo pacman -S wl-clipboard
# Debian: sudo apt install wl-clipboard
# Fedora: sudo dnf install wl-clipboard
# Test it works
echo "test" | wl-copy
wl-pasteCause: notify-send not installed or notifications disabled.
Solution:
# Install libnotify
# Arch: sudo pacman -S libnotify
# Debian: sudo apt install libnotify-bin
# Fedora: sudo dnf install libnotify
# Test
notify-send "Test" "This is a test"Solutions:
- Use a smaller model:
[whisper]
model = "tiny.en" # Fastest- Increase thread count:
[whisper]
threads = 8 # Match your CPU cores- Use English-only model:
.enmodels are faster than multilingual models.
Cause: Whisper inference is CPU-intensive.
Solutions:
- Limit threads:
[whisper]
threads = 4 # Limit CPU usage- Use smaller model:
[whisper]
model = "tiny.en"Cause: Large Whisper models require significant RAM.
| Model | Approximate RAM |
|---|---|
| tiny.en | ~400 MB |
| base.en | ~500 MB |
| small.en | ~1 GB |
| medium.en | ~2.5 GB |
| large-v3 | ~4 GB |
Solution: Use a smaller model if RAM is limited.
Cause: System load or evdev latency.
Solutions:
- Ensure voxtype is running with normal priority
- Check for other applications using evdev
- Try a different hotkey
# Check status
systemctl --user status voxtype
# View logs
journalctl --user -u voxtype -n 50
# Common issues:
# - Not in input group (log out/in after adding)
# - Model not downloaded
# - ydotool not runningCause: Session environment not available.
Solution: Ensure you're running under a graphical session:
# Check environment
echo $XDG_RUNTIME_DIR
echo $WAYLAND_DISPLAY# Enable the service
systemctl --user enable voxtype
# Check if it's enabled
systemctl --user is-enabled voxtype
# Check startup targets
systemctl --user list-dependencies default.target# Verbose
voxtype -v
# Debug (most verbose)
voxtype -vv
# Or via environment
RUST_LOG=debug voxtype
RUST_LOG=voxtype=trace voxtype# Audio capture issues
RUST_LOG=voxtype::audio=debug voxtype
# Hotkey issues
RUST_LOG=voxtype::hotkey=debug voxtype
# Whisper issues
RUST_LOG=voxtype::transcribe=debug voxtype
# Output issues
RUST_LOG=voxtype::output=debug voxtypevoxtype -vv 2>&1 | tee voxtype.log# Kernel input messages
dmesg | grep -i input
# Audio system
journalctl --user -u pipewire -n 20
journalctl --user -u pulseaudio -n 20If you're still having issues:
- Run setup check:
voxtype setup - Gather debug logs:
voxtype -vv 2>&1 | tee debug.log - Check system info:
uname -a groups pactl info systemctl --user status ydotool
- Open an issue: https://github.com/peteonrails/voxtype/issues
Include:
- Voxtype version (
voxtype --version) - Linux distribution and version
- Wayland compositor
- Debug log output
- Steps to reproduce
We want to hear from you! Voxtype is a young project and your feedback helps make it better.
- Something not working? If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to issues.
- Like Voxtype? I don't accept donations, but if you find it useful, a star on the GitHub repository would mean a lot!