Skip to content

Feature: Add SenseVoice as an ASR engine for faster subtitle generation #314

@LauraGPT

Description

@LauraGPT

Summary

Add SenseVoice as an alternative ASR engine for subtitle generation.

Why?

SmartSub generates subtitles from video — SenseVoice can make this faster and more accurate:

  • 5x faster: Non-autoregressive model, generates all tokens in a single forward pass
  • 50+ languages: Single 234MB model, no need to download separate models per language
  • Better CJK accuracy: Lower CER than Whisper on Chinese, Japanese, Korean benchmarks
  • No silence hallucination: Won't generate phantom text during quiet parts of video

Integration

from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")
# Returns text with timestamps for each segment

For cross-platform (no Python): Sherpa-ONNX provides TypeScript/C++ bindings for SenseVoice.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions