Voxtype Troubleshooting Guide

Solutions to common issues when using Voxtype.

Modifier Key Interference (Hyprland/Sway/River)
Permission Issues
Audio Problems
Transcription Issues
Output Problems
Performance Issues
Systemd Service Issues
Debug Mode

Modifier Key Interference (Hyprland/Sway/River)

Typed text triggers window manager shortcuts instead of inserting text

Symptoms: When using compositor keybindings with modifiers (e.g., SUPER+CTRL+X or SUPER+O), releasing keys slowly causes typed output to trigger shortcuts instead of inserting text. For example, if you release X while still holding SUPER, the transcribed "hello" might trigger SUPER+h, SUPER+e, SUPER+l, etc.

Cause: The compositor tracks physical keyboard state. Even though voxtype types text, if you're still physically holding a modifier key, the compositor combines them.

Solution: Use the compositor setup command to automatically install a fix that blocks modifier keys during text output.

For Hyprland:

voxtype setup compositor hyprland
hyprctl reload
systemctl --user restart voxtype

For Sway:

voxtype setup compositor sway
swaymsg reload
systemctl --user restart voxtype

For River:

voxtype setup compositor river
# Restart River or source the new config
systemctl --user restart voxtype

Note: This command does NOT set up keybindings—it only installs the modifier interference fix. See the User Manual to set up your push-to-talk hotkey.

This command:

Writes a modifier-blocking submap/mode to ~/.config/hypr/conf.d/voxtype-submap.conf (or sway/conf.d/voxtype-mode.conf, or river/conf.d/voxtype-mode.sh)
Adds pre/post output hooks to your voxtype config
Checks that your compositor config sources the conf.d directory

If voxtype crashes while typing, press F12 to escape the submap and restore normal modifier behavior.

Manual setup: See voxtype setup compositor hyprland --show for the full configuration if you prefer to set it up manually.

Alternative workaround: If you can't use submaps, a simple delay before typing may help:

[output.post_process]
command = "sleep 0.3 && cat"
timeout_ms = 5000

Compositors Without Mode/Submap Support

The automatic fix (voxtype setup compositor) only works on compositors that support input modes or submaps:

Compositor	Supported	Why
Hyprland	Yes	Has submaps
Sway	Yes	Has modes
River	Yes	Has modes
Qtile	No	No mode/submap concept
Niri	No	No mode/submap concept
GNOME	No	No mode/submap concept
KDE	No	No mode/submap concept

For unsupported compositors, use one of these alternatives:

Use a dedicated key without modifiers - Keys like ScrollLock, Pause, or F13-F24 don't have this problem since there are no modifiers to interfere:
```
[hotkey]
key = "SCROLLLOCK"
```
Use the post-processor delay (works on any compositor):
```
[output.post_process]
command = "sleep 0.3 && cat"
timeout_ms = 5000
```
This gives you 300ms to release all keys before typing starts.
Use voxtype's built-in evdev hotkey instead of compositor keybindings - release timing doesn't matter since voxtype controls the entire recording flow.

Permission Issues

"Cannot open input device" or "Permission denied"

Cause: User is not in the input group, required for evdev access.

Solution:

# Add user to input group
sudo usermod -aG input $USER

# IMPORTANT: Log out and back in for changes to take effect
# Verify membership
groups | grep input

"Failed to access /dev/input/event*"

Cause: udev rules preventing access, or input group not applied.

Solution:

Verify group membership: groups | grep input
If not shown, log out and back in completely
If still failing, check udev rules:

ls -la /dev/input/event*
# Should show group 'input' with rw permissions

"Unable to create uinput device" (ydotool)

Cause: uinput module not loaded or wrong permissions.

Solution:

# Load uinput module
sudo modprobe uinput

# Make it persistent
echo "uinput" | sudo tee /etc/modules-load.d/uinput.conf

# Check ydotool daemon
systemctl --user status ydotool

Audio Problems

"No audio captured" or empty transcriptions

Possible causes and solutions:

1. Wrong audio device selected

# List available audio sources
pactl list sources short

# Test recording with system default
arecord -d 3 -f S16_LE -r 16000 test.wav
aplay test.wav

If test recording works, check your Voxtype config:

[audio]
device = "default"  # Or specific device name from pactl

2. Microphone muted or volume too low

# Check PulseAudio/PipeWire volume
pavucontrol
# Or
pactl list sources | grep -A 10 "Default"

3. PipeWire/PulseAudio not running

# Check audio server status
pactl info

# Restart if needed
systemctl --user restart pipewire pipewire-pulse
# Or for PulseAudio:
systemctl --user restart pulseaudio

"Audio format not supported"

Cause: Audio device doesn't support 16kHz sample rate.

Solution: Voxtype handles resampling internally, but ensure your device works:

# Test at native rate
arecord -d 2 test.wav
aplay test.wav

Recording stops unexpectedly

Cause: max_duration_secs limit reached.

Solution: Increase the limit:

[audio]
max_duration_secs = 120  # 2 minutes

Transcription Issues

"Model not found"

Cause: Whisper model not downloaded or wrong path.

Solution:

# Download the model
voxtype setup --download

# Or manually download
mkdir -p ~/.local/share/voxtype/models
curl -L -o ~/.local/share/voxtype/models/ggml-base.en.bin \
    https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Voxtype crashes during transcription

Cause: On some systems (particularly with glibc 2.42+ like Ubuntu 25.10), the whisper-rs FFI bindings crash due to C++ exceptions crossing the FFI boundary.

Solution: Use the CLI backend which runs whisper-cli as a subprocess:

[whisper]
backend = "cli"

This requires whisper-cli to be installed. Build it from whisper.cpp:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release
sudo cp build/bin/whisper-cli /usr/local/bin/

See CLI Backend in the User Manual for details.

Poor transcription accuracy

Possible causes:

1. Using wrong model

For English: Use .en models (e.g., base.en)
For accuracy: Use larger models (small.en, medium.en)

2. Audio quality issues

Use a quality microphone
Reduce background noise
Maintain consistent distance from mic

3. Wrong language setting

[whisper]
model = "base.en"  # For English
language = "en"

4. Context window optimization

Context window optimization is disabled by default because it can cause phrase repetition with some models (especially large-v3 and large-v3-turbo).

If you want faster transcription and your model works well with it, you can enable it:

[whisper]
context_window_optimization = true

Or via command line:

voxtype --whisper-context-optimization daemon

If you experience phrase repetition (e.g., "word word word"), make sure this setting is disabled (the default).

Transcription includes "[BLANK_AUDIO]" or similar

Cause: Recording contains mostly silence.

Solution:

Check microphone is working
Increase microphone sensitivity
Speak closer to the microphone

Hallucinations (transcribed text not spoken)

Cause: Known Whisper behavior with silence or noise.

Solutions:

Use a larger model for better accuracy
Avoid recording ambient noise
Keep recordings short and speech-focused

Phrase repetition (same words repeated multiple times)

Cause: Known issue with Whisper large-v3 models, especially when context window optimization is enabled.

Example: Saying "increase the limit" produces "increase the limit increase the limit increase the limit"

Solutions:

Ensure context_window_optimization is disabled (the default):
```
[whisper]
context_window_optimization = false
```
Try a different model (large-v3-turbo and large-v3 are most affected)
If using context optimization and experiencing issues, disable it

Output Problems

wtype not working on KDE Plasma or GNOME Wayland

Symptom: wtype fails with "Compositor does not support the virtual keyboard protocol"

Cause: KDE Plasma and GNOME do not implement the zwp_virtual_keyboard_v1 Wayland protocol that wtype requires. This is a compositor limitation, not a voxtype bug.

What happens: Voxtype detects this failure and automatically falls back to dotool, then ydotool. If neither is set up, it falls back to clipboard mode.

Solution 1 (Recommended): Install dotool. Unlike ydotool, dotool does not require a daemon and supports keyboard layouts for non-US keyboards:

# 1. Install dotool (check your distribution's package manager)
# Arch (AUR):
yay -S dotool
# From source: https://sr.ht/~geb/dotool/

# 2. Add user to input group (required for uinput access)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect

# 3. Configure keyboard layout if needed (non-US keyboards)
# Add to config.toml:
# [output]
# dotool_xkb_layout = "de"  # German, French ("fr"), etc.

Solution 2: Set up ydotool as your typing backend. Unlike dotool, ydotool requires a daemon to be running:

# 1. Install ydotool
# Arch:
sudo pacman -S ydotool
# Fedora:
sudo dnf install ydotool
# Ubuntu/Debian:
sudo apt install ydotool

# 2. Enable and start the daemon (required!)
systemctl --user enable --now ydotool

# 3. Verify it's running
systemctl --user status ydotool

Important: For ydotool, simply having it installed is not enough. The daemon must be running for the fallback to work.

Alternative: Use clipboard or paste mode instead of type mode:

[output]
mode = "clipboard"  # Copies to clipboard, you paste manually
# or
mode = "paste"      # Copies to clipboard, then simulates Ctrl+V

Compositor compatibility:

Desktop	wtype	dotool	ydotool	Recommended
Hyprland	✓	✓	✓	wtype
Sway	✓	✓	✓	wtype
River	✓	✓	✓	wtype
KDE Plasma (Wayland)	✗	✓	✓	dotool
GNOME (Wayland)	✗	✓	✓	dotool
X11 (any desktop)	✗	✓	✓	dotool

Text output not working on X11

Symptom: You're running X11 (not Wayland) and see errors like:

WARN  wtype failed: Wayland connection failed
WARN  clipboard (wl-copy) failed: Text injection failed
ERROR Output failed: All output methods failed.

Cause: wtype and wl-copy are Wayland-only tools. On X11, voxtype needs dotool, ydotool, or xclip installed.

Solution: Install one of these X11-compatible tools:

Option 1 (Recommended): Install dotool

dotool works on X11, supports keyboard layouts, and doesn't need a daemon:

# Ubuntu/Debian (from source):
sudo apt install libxkbcommon-dev
git clone https://git.sr.ht/~geb/dotool
cd dotool && ./build.sh && sudo cp dotool /usr/local/bin/

# Arch (AUR):
yay -S dotool

# Add user to input group
sudo usermod -aG input $USER
# Log out and back in

Option 2: Install ydotool

ydotool works on X11 but requires a running daemon:

# Ubuntu/Debian:
sudo apt install ydotool

# Start the daemon (see "ydotool daemon not running" section for Fedora)
systemctl --user enable --now ydotool

Option 3: Use clipboard mode with xclip

For clipboard-only output (you paste manually with Ctrl+V):

# Ubuntu/Debian:
sudo apt install xclip

Then configure voxtype to use clipboard mode:

[output]
mode = "clipboard"

Verify your setup:

voxtype setup

This shows which output tools are installed and available.

Wrong characters on non-US keyboard layouts (y/z swapped, QWERTZ, AZERTY)

Symptom: Transcribed text has wrong characters. For example, on a German keyboard, "Python" becomes "Pzthon" and "zebra" becomes "yebra" (y and z are swapped).

Cause: ydotool sends raw US keycodes and doesn't support keyboard layouts. When voxtype falls back to ydotool (e.g., on X11, Cinnamon, or when wtype fails), characters are typed as if you had a US keyboard layout.

Solution: Install dotool and configure your keyboard layout. Unlike ydotool, dotool supports keyboard layouts via XKB:

# 1. Install dotool
# Arch (AUR):
yay -S dotool
# Ubuntu/Debian (from source):
# See https://sr.ht/~geb/dotool/ for instructions
# Fedora (from source):
# See https://sr.ht/~geb/dotool/ for instructions

# 2. Add user to input group (required for uinput access)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect

# 3. Configure your keyboard layout in config.toml:

Add to ~/.config/voxtype/config.toml:

[output]
dotool_xkb_layout = "de"  # German QWERTZ

Common layout codes:

de - German (QWERTZ)
fr - French (AZERTY)
es - Spanish
uk - Ukrainian
ru - Russian
pl - Polish
it - Italian
pt - Portuguese

For layout variants (e.g., German without dead keys):

[output]
dotool_xkb_layout = "de"
dotool_xkb_variant = "nodeadkeys"

Alternative: Use paste mode, which copies text to the clipboard and simulates Ctrl+V. This works regardless of keyboard layout:

[output]
mode = "paste"

Note: The keyboard layout fix requires voxtype v0.5.0 or later. If you're on an older version, upgrade first.

"ydotool daemon not running"

Cause: ydotool systemd service not started, or configured incorrectly for your distribution.

Solution: The setup varies by distribution:

Arch Linux (user service)

Arch provides a user-level service that runs in your session:

# Enable and start ydotool as a user service
systemctl --user enable --now ydotool

# Verify it's running
systemctl --user status ydotool

Fedora (system service)

Fedora provides a system-level service that requires additional configuration to work with your user:

# 1. Enable and start the system service
sudo systemctl enable --now ydotool

# 2. Edit the service to allow your user to access the socket
sudo systemctl edit ydotool

Add this content (replace 1000 with your user/group ID from id -u and id -g):

[Service]
ExecStart=
ExecStart=/usr/bin/ydotoold --socket-path="/run/user/1000/.ydotool_socket" --socket-own="1000:1000"

Then restart:

sudo systemctl restart ydotool

# Verify it's running
systemctl status ydotool

Ubuntu/Debian

Check which service type is available:

# Check for user service
systemctl --user status ydotool

# If not found, check for system service
systemctl status ydotool

If only a system service exists, follow the Fedora instructions above.

Verify ydotool works

# Test that ydotool can type
ydotool type "test"

If you see "Failed to connect to socket", the daemon isn't running or the socket permissions are wrong.

Text not typed / nothing happens

Possible causes:

1. ydotool not working

# Test ydotool directly
ydotool type "test"

2. Fallback to clipboard not working

# Test wl-copy
echo "test" | wl-copy
wl-paste

3. Application blocking input

Some applications (terminals, games) may block simulated input.

Solution: Use clipboard mode:

[output]
mode = "clipboard"

Characters dropped or garbled

Cause: Typing too fast for the application.

Solution: Increase typing delay:

[output]
type_delay_ms = 10  # Try 10-50ms

Clipboard not working

Cause: wl-copy not installed or Wayland session issue.

Solution:

# Install wl-clipboard
# Arch: sudo pacman -S wl-clipboard
# Debian: sudo apt install wl-clipboard
# Fedora: sudo dnf install wl-clipboard

# Test it works
echo "test" | wl-copy
wl-paste

No desktop notification

Cause: notify-send not installed or notifications disabled.

Solution:

# Install libnotify
# Arch: sudo pacman -S libnotify
# Debian: sudo apt install libnotify-bin
# Fedora: sudo dnf install libnotify

# Test
notify-send "Test" "This is a test"

Performance Issues

Slow transcription

Solutions:

Use a smaller model:

[whisper]
model = "tiny.en"  # Fastest

Increase thread count:

[whisper]
threads = 8  # Match your CPU cores

Use English-only model: .en models are faster than multilingual models.

High CPU usage

Cause: Whisper inference is CPU-intensive.

Solutions:

Limit threads:

[whisper]
threads = 4  # Limit CPU usage

Use smaller model:

[whisper]
model = "tiny.en"

High memory usage

Cause: Large Whisper models require significant RAM.

Model	Approximate RAM
tiny.en	~400 MB
base.en	~500 MB
small.en	~1 GB
medium.en	~2.5 GB
large-v3	~4 GB

Solution: Use a smaller model if RAM is limited.

Hotkey lag / delayed recording start

Cause: System load or evdev latency.

Solutions:

Ensure voxtype is running with normal priority
Check for other applications using evdev
Try a different hotkey

Systemd Service Issues

Service fails to start

# Check status
systemctl --user status voxtype

# View logs
journalctl --user -u voxtype -n 50

# Common issues:
# - Not in input group (log out/in after adding)
# - Model not downloaded
# - ydotool not running

Service starts but doesn't work

Cause: Session environment not available.

Solution: Ensure you're running under a graphical session:

# Check environment
echo $XDG_RUNTIME_DIR
echo $WAYLAND_DISPLAY

Service doesn't start on login

# Enable the service
systemctl --user enable voxtype

# Check if it's enabled
systemctl --user is-enabled voxtype

# Check startup targets
systemctl --user list-dependencies default.target

Debug Mode

Enable verbose logging

# Verbose
voxtype -v

# Debug (most verbose)
voxtype -vv

# Or via environment
RUST_LOG=debug voxtype
RUST_LOG=voxtype=trace voxtype

Debug specific components

# Audio capture issues
RUST_LOG=voxtype::audio=debug voxtype

# Hotkey issues
RUST_LOG=voxtype::hotkey=debug voxtype

# Whisper issues
RUST_LOG=voxtype::transcribe=debug voxtype

# Output issues
RUST_LOG=voxtype::output=debug voxtype

Log to file

voxtype -vv 2>&1 | tee voxtype.log

Check system logs

# Kernel input messages
dmesg | grep -i input

# Audio system
journalctl --user -u pipewire -n 20
journalctl --user -u pulseaudio -n 20

Getting Help

If you're still having issues:

Run setup check: voxtype setup
Gather debug logs: voxtype -vv 2>&1 | tee debug.log

Check system info:

uname -a
groups
pactl info
systemctl --user status ydotool

Open an issue: https://github.com/peteonrails/voxtype/issues

Include:

Voxtype version (voxtype --version)
Linux distribution and version
Wayland compositor
Debug log output
Steps to reproduce

Feedback

We want to hear from you! Voxtype is a young project and your feedback helps make it better.

Something not working? If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to issues.
Like Voxtype? I don't accept donations, but if you find it useful, a star on the GitHub repository would mean a lot!

FilesExpand file tree

TROUBLESHOOTING.md

Latest commit

History

TROUBLESHOOTING.md

File metadata and controls

Voxtype Troubleshooting Guide

Table of Contents

Modifier Key Interference (Hyprland/Sway/River)

Typed text triggers window manager shortcuts instead of inserting text

Compositors Without Mode/Submap Support

Permission Issues

"Cannot open input device" or "Permission denied"

"Failed to access /dev/input/event*"

"Unable to create uinput device" (ydotool)

Audio Problems

"No audio captured" or empty transcriptions

1. Wrong audio device selected

2. Microphone muted or volume too low

3. PipeWire/PulseAudio not running

"Audio format not supported"

Recording stops unexpectedly

Transcription Issues

"Model not found"

Voxtype crashes during transcription

Poor transcription accuracy

1. Using wrong model

2. Audio quality issues

3. Wrong language setting

4. Context window optimization

Transcription includes "[BLANK_AUDIO]" or similar

Hallucinations (transcribed text not spoken)

Phrase repetition (same words repeated multiple times)

Output Problems

wtype not working on KDE Plasma or GNOME Wayland

Text output not working on X11

Wrong characters on non-US keyboard layouts (y/z swapped, QWERTZ, AZERTY)

"ydotool daemon not running"

Arch Linux (user service)

Fedora (system service)

Ubuntu/Debian

Verify ydotool works

Text not typed / nothing happens

1. ydotool not working

2. Fallback to clipboard not working

3. Application blocking input

Characters dropped or garbled

Clipboard not working

No desktop notification

Performance Issues

Slow transcription

High CPU usage

High memory usage

Hotkey lag / delayed recording start

Systemd Service Issues

Service fails to start

Service starts but doesn't work

Service doesn't start on login

Debug Mode

Enable verbose logging

Debug specific components

Log to file

Check system logs

Getting Help

Feedback