Skip to content

Discussion: FunASR/SenseVoice as alternative non-autoregressive ASR #1448

@LauraGPT

Description

@LauraGPT

Discussion

Hi! This is not an issue report, but a discussion about alternative ASR approaches.

faster-whisper is excellent for Whisper acceleration. For users who need even faster inference, SenseVoice uses a fundamentally different approach — non-autoregressive architecture — which can be 5-10x faster than Whisper for the same accuracy.

Key differences:

Feature faster-whisper FunASR SenseVoice
Architecture Autoregressive (Whisper) Non-autoregressive
Speed ~30x realtime (GPU) ~170x realtime (GPU)
Languages 99 50+
Params 1.5B (large-v3) 234M
Emotion detection No Yes
Audio events No Yes (laughter, music, etc.)

When to consider SenseVoice:

  • Batch processing large audio collections (5-10x faster)
  • Chinese/Japanese/Korean ASR (superior accuracy)
  • Need emotion detection in speech
  • Resource-constrained environments (6x smaller model)

When faster-whisper is better:

  • Need 99 language coverage
  • English-primary use cases
  • Existing Whisper-based pipelines

FunASR also provides Paraformer for accurate timestamps and Fun-ASR-Nano (LLM-based, 800M params) with hotword support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions