WARNING: Experimental Feature
Parakeet support is experimental and not yet fully integrated into voxtype's setup system. Configuration requires manual editing of config files. The API and configuration options may change in future releases. Use at your own risk.
Voxtype 0.5.0+ includes experimental support for NVIDIA's Parakeet ASR models as an alternative to Whisper. Parakeet uses ONNX Runtime and offers excellent CPU performance without requiring a GPU.
Parakeet is NVIDIA's FastConformer-based speech recognition model. The TDT (Token-and-Duration Transducer) variant provides:
- Fast CPU inference with AVX-512 optimization
- Proper punctuation and capitalization
- Good accuracy for English dictation
- No GPU required (though CUDA acceleration is available)
- A Parakeet-enabled voxtype binary (see below)
- ~600MB disk space for the model
- CPU with AVX2 or AVX-512 (AVX-512 recommended for best performance)
Parakeet support requires a specially compiled binary. Download from the releases page:
| Binary | Use Case |
|---|---|
voxtype-*-parakeet-avx2 |
Most CPUs (Intel Haswell+, AMD Zen+) |
voxtype-*-parakeet-avx512 |
Modern CPUs with AVX-512 (Intel Ice Lake+, AMD Zen 4+) |
voxtype-*-parakeet-cuda |
NVIDIA GPU acceleration with CPU fallback |
The AVX2 binary works on most modern x86_64 CPUs. Use AVX-512 if your CPU supports it for better performance.
Download the Parakeet TDT 0.6B model:
# Create models directory
mkdir -p ~/.local/share/voxtype/models
# Download and extract the model
cd ~/.local/share/voxtype/models
curl -L https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/resolve/main/onnx/encoder-model.onnx -o encoder-model.onnx
curl -L https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/resolve/main/onnx/encoder-model.onnx.data -o encoder-model.onnx.data
curl -L https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/resolve/main/onnx/decoder_joint-model.onnx -o decoder_joint-model.onnx
# Or download the full directory structure
# The model should be at: ~/.local/share/voxtype/models/parakeet-tdt-0.6b-v2/Alternatively, use a v3 model if available:
mkdir -p ~/.local/share/voxtype/models/parakeet-tdt-0.6b-v3
cd ~/.local/share/voxtype/models/parakeet-tdt-0.6b-v3
# Download encoder-model.onnx, encoder-model.onnx.data, decoder_joint-model.onnxThe standard voxtype binary does not include Parakeet support. You must switch to a Parakeet-enabled binary.
Manual switching (until voxtype setup engine is implemented):
# Download the Parakeet binary for your CPU
# Example: AVX-512 capable CPU
curl -L https://github.com/peteonrails/voxtype/releases/download/v0.5.0/voxtype-0.5.0-linux-x86_64-parakeet-avx512 \
-o /tmp/voxtype-parakeet
# Make executable and install
chmod +x /tmp/voxtype-parakeet
sudo mv /tmp/voxtype-parakeet /usr/local/bin/voxtype
# Restart the daemon
systemctl --user restart voxtype
# Verify
voxtype --versionTo switch back to Whisper, download and install the standard binary (avx2, avx512, or vulkan).
Edit ~/.config/voxtype/config.toml:
# Select Parakeet as the transcription engine
engine = "parakeet"
[parakeet]
# Model name (looked up in ~/.local/share/voxtype/models/)
model = "parakeet-tdt-0.6b-v3"
# Or use an absolute path
# model_path = "/path/to/parakeet-tdt-0.6b-v3"Restart the daemon:
systemctl --user restart voxtypeVerify Parakeet is active:
journalctl --user -u voxtype --since "1 minute ago" | grep -i parakeet
# Should show: "Loading Parakeet Tdt model from..."Tested on Ryzen 9 9900X3D (AVX-512):
| Audio Length | Transcription Time | Real-time Factor |
|---|---|---|
| 1-2s | 0.06-0.09s | ~20x |
| 3-4s | 0.11-0.13s | ~30x |
| 5s | 0.15s | ~33x |
Model load time: ~1.2 seconds (one-time at daemon startup)
| Engine | Backend | Typical Speed | GPU Required |
|---|---|---|---|
| Whisper small | CPU | ~3x real-time | No |
| Whisper small | Vulkan | ~60x real-time | Yes |
| Parakeet TDT | CPU (AVX-512) | ~30x real-time | No |
| Parakeet TDT | CUDA | ~80x real-time | Yes (NVIDIA) |
Parakeet on CPU is significantly faster than Whisper on CPU, and competitive with Whisper on GPU.
Parakeet can hallucinate extra repetitions when you speak repeated words. For example, saying "no no no no no" might transcribe as many more "no"s than you actually said. This is a known issue with many ASR models.
Uncommon names and technical terms may be substituted with phonetically similar common words. For example:
- "Krzyzewski" → "Krasiewski"
- "Nguyen" → "Gwen"
Parakeet TDT models are English-only. For multilingual support, use Whisper.
The Parakeet TDT 0.6B model is ~600MB, compared to Whisper small at ~500MB. Larger Parakeet models are available but not yet tested with voxtype.
To switch back to Whisper, edit your config:
engine = "whisper"
[whisper]
model = "small"Or simply remove the engine line (Whisper is the default).
You're using a standard voxtype binary without Parakeet support. Download a parakeet-* binary from the releases page.
Add the [parakeet] section to your config:
[parakeet]
model = "parakeet-tdt-0.6b-v3"Ensure the model is in the correct location:
ls ~/.local/share/voxtype/models/parakeet-tdt-0.6b-v3/
# Should show: encoder-model.onnx, encoder-model.onnx.data, decoder_joint-model.onnxParakeet binaries include ONNX Runtime, which contains AVX-512 optimized code paths. ONNX Runtime performs CPU feature detection at runtime and should only execute instructions your CPU supports.
If you experience a SIGILL (illegal instruction) crash, this is likely a bug in ONNX Runtime's CPU detection rather than a fundamental incompatibility. As a workaround, switch to a Whisper binary:
voxtype-*-avx2- Works on Intel Haswell+ and AMD Zen+voxtype-*-vulkan- GPU acceleration for AMD/Intel GPUs
Please report the issue at https://github.com/peteonrails/voxtype/issues with:
- Your CPU model (
cat /proc/cpuinfo | grep "model name" | head -1) - Which Parakeet binary you were using
- The full error output
Parakeet support is experimental. Please report issues at: https://github.com/peteonrails/voxtype/issues
Include:
- Your CPU model
- Which binary you're using (avx2/avx512/cuda)
- The Parakeet model version
- Sample audio if possible (for accuracy issues)