Skip to content

Feature request: Add SenseVoice/Paraformer as alternative ASR backend #1447

@LauraGPT

Description

@LauraGPT

Summary

Would it be possible to add SenseVoice and Paraformer as alternative ASR backends alongside the current CTranslate2-based Whisper?

Motivation

faster-whisper excels at efficient Whisper inference. However, some use cases would benefit from alternative ASR architectures:

  • SenseVoice (234M params): Non-autoregressive model achieving ~25x faster than Whisper-large with comparable accuracy on 50+ languages. Also provides emotion detection and audio event classification.
  • Paraformer: Non-autoregressive Chinese ASR with state-of-the-art accuracy on AISHELL benchmarks, including built-in VAD and punctuation.
  • Fun-ASR-Nano: LLM-based ASR (SenseVoice encoder + Qwen3-0.6B decoder, 800M params) supporting 31 languages.

All models are available via FunASR (pip install funasr) and as ONNX exports via Sherpa-ONNX for CTranslate2-like optimized inference.

Benchmark comparison

Model Type Speed (GPU) Languages Extra features
Whisper-large-v3 Autoregressive 1x baseline 99 Translation
SenseVoice-Small Non-AR ~25x 50+ Emotion, events
Paraformer-large Non-AR ~170x realtime Chinese VAD, punctuation

Quick start

pip install funasr

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions