Frequently Asked Questions

Common questions about Voxtype.

General

What is Voxtype?

Voxtype is a push-to-talk voice-to-text tool for Linux. Optimized for Wayland, works on X11 too. You hold a hotkey, speak, release the key, and your speech is transcribed and either typed at your cursor position or copied to the clipboard.

Why another voice-to-text tool?

Most voice-to-text solutions for Linux either:

Require internet/cloud services
Are compositor or desktop-specific
Don't support CJK (Korean, Chinese, Japanese) characters

Voxtype is designed to:

Work on any Linux desktop (Wayland or X11)
Be fully offline (uses local Whisper models)
Use the push-to-talk paradigm (more predictable than continuous listening)
Support CJK characters via wtype on Wayland

Does it work on X11?

Yes! Voxtype works on both Wayland and X11. It uses evdev (kernel-level) for hotkey detection, which works everywhere. For text output, it uses wtype on Wayland (with CJK support), with dotool and ydotool as fallbacks.

Does it require an internet connection?

No. All speech recognition is done locally using whisper.cpp. The only time network access is used is to download Whisper models during initial setup.

Compatibility

Which desktops are supported?

All of them! Voxtype is optimized for Wayland compositors with native keybinding support:

Hyprland, Sway, River - Full push-to-talk via compositor keybindings (no special permissions needed)
GNOME, KDE Plasma - Works with built-in evdev hotkey (requires input group)
X11 desktops (i3, etc.) - Works with built-in evdev hotkey (requires input group)

For text output, Voxtype uses:

wtype on Wayland (best CJK/Unicode support, no daemon needed)
dotool as fallback (supports keyboard layouts, no daemon needed)
ydotool on X11 or as fallback (requires daemon)

Which audio systems are supported?

PipeWire (recommended)
PulseAudio
ALSA (directly)

Does it work with Bluetooth microphones?

Yes, as long as your Bluetooth microphone is recognized by PipeWire/PulseAudio as an audio source.

Does it work in all applications?

For type mode: Most applications work. Some may have issues:

Full-screen games may not receive input
Some terminal emulators handle pasted input differently
Electron apps occasionally have issues

For clipboard mode: Works universally (you just need to paste manually).

Does wtype work on KDE Plasma or GNOME?

No. KDE Plasma and GNOME (on Wayland) do not support the virtual keyboard protocol that wtype requires. You'll see the error: "Compositor does not support the virtual keyboard protocol."

Solution: Install dotool (recommended) or use ydotool. Voxtype automatically falls back to dotool, then ydotool, when wtype fails.

For dotool (recommended, supports keyboard layouts):

# Install dotool and add user to input group
sudo usermod -aG input $USER
# Log out and back in

For ydotool (requires daemon):

systemctl --user enable --now ydotool

See Troubleshooting for complete setup instructions.

Features

Can I use a different hotkey?

Yes! Any key that shows up in evtest can be used. Common choices:

ScrollLock (default)
Pause/Break
Right Alt
F13-F24 (if your keyboard has them)

Configure in ~/.config/voxtype/config.toml:

[hotkey]
key = "PAUSE"

Can I use key combinations?

Yes, you can require modifier keys:

[hotkey]
key = "SCROLLLOCK"
modifiers = ["LEFTCTRL"]  # Ctrl+ScrollLock

Does it support multiple languages?

Yes! Use large-v3 which supports 99 languages:

Transcribe in the spoken language (speak French, output French):

[whisper]
model = "large-v3"
language = "auto"
translate = false

With GPU acceleration, large-v3 achieves sub-second inference while supporting all languages.

Can it translate to English?

Yes! Speak any language and get English output:

[whisper]
model = "large-v3"
language = "auto"
translate = true

Can I transcribe audio files?

Yes, use the transcribe command:

voxtype transcribe recording.wav

Does it add punctuation?

Whisper automatically adds punctuation based on context. For explicit punctuation, you can speak it (e.g., "period", "comma", "question mark").

Can I customize the Waybar/status bar icons?

Yes! Voxtype supports 10 built-in icon themes plus custom icons. Configure in ~/.config/voxtype/config.toml:

[status]
icon_theme = "nerd-font"

Available themes:

Theme	Description	Requirements
`emoji`	🎙️ 🎤 ⏳ (default)	None
`nerd-font`	Nerd Font icons	Nerd Font
`material`	Material Design Icons	MDI Font
`phosphor`	Phosphor Icons	Phosphor Font
`codicons`	VS Code icons	Codicons
`omarchy`	Omarchy distro icons	Omarchy font
`minimal`	○ ● ◐ ×	None
`dots`	◯ ⬤ ◔ ◌	None
`arrows`	▶ ● ↻ ■	None
`text`	[MIC] [REC] [...] [OFF]	None

You can also override individual icons or create custom theme files. See the Waybar Integration Guide for complete details.

Technical

Why do I need to be in the 'input' group?

Most Wayland users don't need this. If you use compositor keybindings (Hyprland, Sway, River), voxtype doesn't need any special permissions.

The input group is only required if you use voxtype's built-in evdev hotkey (e.g., on X11 or GNOME/KDE). The evdev subsystem requires read access to /dev/input/event* devices, which is restricted to the input group for security reasons.

Why does it need wtype/dotool/ydotool?

Neither Wayland nor X11 provide a universal way for applications to simulate keyboard input. Voxtype uses a fallback chain:

wtype on Wayland - uses the virtual-keyboard protocol, supports CJK characters, no daemon needed
dotool as fallback - uses the kernel's uinput interface, supports keyboard layouts, no daemon needed
ydotool on X11 (or Wayland fallback) - uses the kernel's uinput interface, requires a daemon

How much RAM does it use?

Depends on the Whisper model:

tiny.en: ~400 MB
base.en: ~500 MB
small.en: ~1 GB
medium.en: ~2.5 GB
large-v3: ~4 GB

How fast is transcription?

Depends on model and hardware. On a modern CPU:

tiny.en: ~10x realtime (1 second of speech = 0.1 second to transcribe)
base.en: ~7x realtime
small.en: ~4x realtime
medium.en: ~2x realtime
large-v3: ~1x realtime

Does it use GPU acceleration?

Yes! Voxtype supports optional GPU acceleration:

Vulkan - Works on AMD, NVIDIA, and Intel GPUs (included in packages)
CUDA - NVIDIA GPUs (build from source)
Metal - Apple Silicon (build from source)
HIP/ROCm - AMD GPUs (build from source)

Vulkan (easiest): Packages include a pre-built Vulkan binary. Install the runtime and enable:

# Install Vulkan runtime (Arch: vulkan-icd-loader, Debian: libvulkan1, Fedora: vulkan-loader)
sudo voxtype setup gpu --enable

Other backends: Build from source with cargo build --release --features gpu-cuda (or gpu-metal, gpu-hipblas).

GPU acceleration dramatically improves inference speed, especially for larger models. The large-v3 model can achieve sub-second inference with GPU acceleration.

Is my voice data sent anywhere?

No. All processing happens locally on your machine. No audio or text is sent to any server.

Troubleshooting

It's not detecting my hotkey

Using compositor keybindings (recommended):

Verify your compositor config calls voxtype record start and voxtype record stop
Check that voxtype is running: pgrep voxtype
Test manually: voxtype record start then voxtype record stop

Using built-in evdev hotkey:

Make sure enabled = true in your config's [hotkey] section
Verify you're in the input group: groups | grep input
Log out and back in after adding to the group
Check the key name with evtest
Try running with debug: voxtype -vv

No text is typed

On Wayland:

Check wtype is installed: which wtype
Test wtype directly: wtype "test"

On X11:

Check ydotool is running: systemctl --user status ydotool
Test ydotool directly: ydotool type "test"

Fallback: Try clipboard mode: voxtype --clipboard

Transcription is inaccurate

Use a larger model: --model small.en
Speak more clearly and at consistent volume
Reduce background noise
Use an .en model for English content

It's too slow

Use a smaller model: --model tiny.en
Increase thread count in config
Keep recordings short

See the Troubleshooting Guide for more solutions.

Privacy & Security

Is it always listening?

No. Voxtype only records audio while you hold the hotkey. When you release the key, recording stops immediately.

Where is my audio stored?

Audio is processed in memory and discarded after transcription. Nothing is saved to disk unless you use the transcribe command on a file.

Can it be used by malware to record me?

Voxtype only records while the hotkey is actively held. However, any application with access to your microphone could potentially record audio. Voxtype doesn't add any new attack surface beyond what PipeWire/PulseAudio already provides.

Is the transcription accurate enough for sensitive content?

Whisper is highly accurate but not perfect. For sensitive or important content:

Use a larger model (medium.en or large-v3)
Review the transcription before using it
Consider that Whisper may occasionally "hallucinate" text

Contributing

How can I contribute?

See the Contributing Guide for details. We welcome:

Bug reports
Feature requests
Code contributions
Documentation improvements
Translations

Where do I report bugs?

Open an issue at: https://github.com/peteonrails/voxtype/issues

Include:

Voxtype version
Linux distribution and version
Wayland compositor
Steps to reproduce
Debug output (voxtype -vv)

Can I request a feature?

Yes! Open a feature request issue at: https://github.com/peteonrails/voxtype/issues

Describe:

What you want to accomplish
Why existing features don't meet your needs
How you envision it working

How can I show my appreciation?

I don't accept donations, but if you find Voxtype useful, a star on the GitHub repository would mean a lot and helps others discover the project!

Feedback

We want to hear from you! Voxtype is a young project and your feedback helps make it better.

Something not working? If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to issues.

FilesExpand file tree

FAQ.md

Latest commit

History