Skip to content

Feature Request: Add SenseVoice/FunASR as ASR engine option #203

@LauraGPT

Description

@LauraGPT

Feature Request

SoniTranslate handles video dubbing with speech recognition and translation. SenseVoice/Paraformer from FunASR would be a strong ASR engine option.

Why

  • 170x real-time on GPU — non-autoregressive, much faster than Whisper for long video processing
  • Built-in speaker diarization (cam++) — essential for multi-speaker dubbing
  • Built-in punctuation — auto-adds punctuation for better subtitle quality
  • 50+ languages with automatic language detection
  • Chinese/Japanese/Korean accuracy — outperforms Whisper on CJK benchmarks

Quick Integration

from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
)
result = model.generate(input="video_audio.wav")
# Returns text with timestamps and speaker labels

Happy to help with integration!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions