Discussion
Hi! This is not an issue report, but a discussion about alternative ASR approaches.
faster-whisper is excellent for Whisper acceleration. For users who need even faster inference, SenseVoice uses a fundamentally different approach — non-autoregressive architecture — which can be 5-10x faster than Whisper for the same accuracy.
Key differences:
| Feature |
faster-whisper |
FunASR SenseVoice |
| Architecture |
Autoregressive (Whisper) |
Non-autoregressive |
| Speed |
~30x realtime (GPU) |
~170x realtime (GPU) |
| Languages |
99 |
50+ |
| Params |
1.5B (large-v3) |
234M |
| Emotion detection |
No |
Yes |
| Audio events |
No |
Yes (laughter, music, etc.) |
When to consider SenseVoice:
- Batch processing large audio collections (5-10x faster)
- Chinese/Japanese/Korean ASR (superior accuracy)
- Need emotion detection in speech
- Resource-constrained environments (6x smaller model)
When faster-whisper is better:
- Need 99 language coverage
- English-primary use cases
- Existing Whisper-based pipelines
FunASR also provides Paraformer for accurate timestamps and Fun-ASR-Nano (LLM-based, 800M params) with hotword support.
Discussion
Hi! This is not an issue report, but a discussion about alternative ASR approaches.
faster-whisper is excellent for Whisper acceleration. For users who need even faster inference, SenseVoice uses a fundamentally different approach — non-autoregressive architecture — which can be 5-10x faster than Whisper for the same accuracy.
Key differences:
When to consider SenseVoice:
When faster-whisper is better:
FunASR also provides Paraformer for accurate timestamps and Fun-ASR-Nano (LLM-based, 800M params) with hotword support.