Discussion: Non-autoregressive ASR models (SenseVoice) as alternative backend

faster-whisper does a great job optimizing Whisper inference with CTranslate2. I wanted to bring up a discussion about non-autoregressive ASR models as a potential alternative approach.

### The speed ceiling of autoregressive models

Even with CTranslate2 optimizations, Whisper's autoregressive architecture has a fundamental speed ceiling — each token depends on the previous one, limiting parallelism. faster-whisper achieves ~4x speedup over vanilla Whisper, reaching roughly 50x realtime on GPU.

### Non-autoregressive alternative: SenseVoice

**SenseVoice** takes a fundamentally different approach — it's non-autoregressive, processing the entire sequence in parallel:

| Metric | faster-whisper (large-v3) | SenseVoice Small |
|--------|--------------------------|------------------|
| Architecture | Autoregressive (CTranslate2) | Non-autoregressive |
| GPU Speed | ~50x realtime | **170x realtime** |
| CPU Speed | ~10x realtime | **17x realtime** |
| Params | 1.5B | 234M |
| Chinese CER | 8.4% (AISHELL) | **3.2%** |
| English WER | 5.1% (LibriSpeech) | Competitive |
| Punctuation | No | **Built-in** |
| Languages | 99 | 50+ |

### Not a replacement, but a complement

I'm not suggesting replacing Whisper — the two architectures have different strengths. But for users who prioritize speed over language coverage (50+ vs 99 languages), SenseVoice could be an interesting alternative backend.

### Resources

- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K+ stars)
- FunASR: https://github.com/modelscope/FunASR (16.6K+ stars)
- CTranslate2-compatible ONNX models: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

Curious what the community thinks about multi-model support in faster-whisper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Non-autoregressive ASR models (SenseVoice) as alternative backend #1449

The speed ceiling of autoregressive models

Non-autoregressive alternative: SenseVoice

Not a replacement, but a complement

Resources

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	faster-whisper (large-v3)	SenseVoice Small
Architecture	Autoregressive (CTranslate2)	Non-autoregressive
GPU Speed	~50x realtime	170x realtime
CPU Speed	~10x realtime	17x realtime
Params	1.5B	234M
Chinese CER	8.4% (AISHELL)	3.2%
English WER	5.1% (LibriSpeech)	Competitive
Punctuation	No	Built-in
Languages	99	50+

Discussion: Non-autoregressive ASR models (SenseVoice) as alternative backend #1449

Description

The speed ceiling of autoregressive models

Non-autoregressive alternative: SenseVoice

Not a replacement, but a complement

Resources

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions