Feature Request
Would it be possible to add FunASR's Paraformer model as an alternative ASR backend alongside Whisper?
Why
- Paraformer is a non-autoregressive model — it runs significantly faster than autoregressive Whisper, especially on long audio (170x realtime on GPU)
- For Chinese/Japanese/Korean audio, Paraformer and SenseVoice consistently outperform Whisper (see SenseVoice benchmark)
- Fun-ASR-Nano (LLM-based, 800M params) supports 31 languages with word-level timestamps
- FunASR models are available as ONNX via Sherpa-ONNX, which faster-whisper could potentially integrate
FunASR Models
| Model |
Params |
Languages |
Speed |
Timestamps |
| SenseVoice-Small |
234M |
50+ |
25x realtime (CPU) |
✅ |
| Paraformer-large |
220M |
zh/en/ja/ko/yue |
170x realtime (GPU) |
✅ |
| Fun-ASR-Nano |
800M |
31 |
GPU via vLLM |
✅ word-level |
Quick test
pip install funasr
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
References
Feature Request
Would it be possible to add FunASR's Paraformer model as an alternative ASR backend alongside Whisper?
Why
FunASR Models
Quick test
References