Feature Request
SoniTranslate handles video dubbing with speech recognition and translation. SenseVoice/Paraformer from FunASR would be a strong ASR engine option.
Why
- 170x real-time on GPU — non-autoregressive, much faster than Whisper for long video processing
- Built-in speaker diarization (cam++) — essential for multi-speaker dubbing
- Built-in punctuation — auto-adds punctuation for better subtitle quality
- 50+ languages with automatic language detection
- Chinese/Japanese/Korean accuracy — outperforms Whisper on CJK benchmarks
Quick Integration
from funasr import AutoModel
model = AutoModel(
model="iic/SenseVoiceSmall",
vad_model="fsmn-vad",
spk_model="cam++",
)
result = model.generate(input="video_audio.wav")
# Returns text with timestamps and speaker labels
Happy to help with integration!
Feature Request
SoniTranslate handles video dubbing with speech recognition and translation. SenseVoice/Paraformer from FunASR would be a strong ASR engine option.
Why
Quick Integration
Happy to help with integration!