Skip to content

Discussion: Non-autoregressive ASR models (SenseVoice) as alternative backend #1449

@LauraGPT

Description

@LauraGPT

faster-whisper does a great job optimizing Whisper inference with CTranslate2. I wanted to bring up a discussion about non-autoregressive ASR models as a potential alternative approach.

The speed ceiling of autoregressive models

Even with CTranslate2 optimizations, Whisper's autoregressive architecture has a fundamental speed ceiling — each token depends on the previous one, limiting parallelism. faster-whisper achieves ~4x speedup over vanilla Whisper, reaching roughly 50x realtime on GPU.

Non-autoregressive alternative: SenseVoice

SenseVoice takes a fundamentally different approach — it's non-autoregressive, processing the entire sequence in parallel:

Metric faster-whisper (large-v3) SenseVoice Small
Architecture Autoregressive (CTranslate2) Non-autoregressive
GPU Speed ~50x realtime 170x realtime
CPU Speed ~10x realtime 17x realtime
Params 1.5B 234M
Chinese CER 8.4% (AISHELL) 3.2%
English WER 5.1% (LibriSpeech) Competitive
Punctuation No Built-in
Languages 99 50+

Not a replacement, but a complement

I'm not suggesting replacing Whisper — the two architectures have different strengths. But for users who prioritize speed over language coverage (50+ vs 99 languages), SenseVoice could be an interesting alternative backend.

Resources

Curious what the community thinks about multi-model support in faster-whisper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions