Summary
Add SenseVoice as an alternative ASR engine for subtitle generation.
Why?
SmartSub generates subtitles from video — SenseVoice can make this faster and more accurate:
- 5x faster: Non-autoregressive model, generates all tokens in a single forward pass
- 50+ languages: Single 234MB model, no need to download separate models per language
- Better CJK accuracy: Lower CER than Whisper on Chinese, Japanese, Korean benchmarks
- No silence hallucination: Won't generate phantom text during quiet parts of video
Integration
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")
# Returns text with timestamps for each segment
For cross-platform (no Python): Sherpa-ONNX provides TypeScript/C++ bindings for SenseVoice.
References
Summary
Add SenseVoice as an alternative ASR engine for subtitle generation.
Why?
SmartSub generates subtitles from video — SenseVoice can make this faster and more accurate:
Integration
For cross-platform (no Python): Sherpa-ONNX provides TypeScript/C++ bindings for SenseVoice.
References