Feature Request
Coqui TTS is one of the best open-source TTS libraries! FunASR could be useful for training data preparation — automatic speech-to-text annotation for audio datasets.
Use case: Annotate raw audio with FunASR → get timestamped transcripts → use for TTS fine-tuning.
Why FunASR?
- SenseVoice: 50+ languages, matching Coqui TTS's multilingual ambition
- Paraformer: Character-level timestamps for precise audio-text alignment
- Built-in VAD + punctuation: One-call pipeline
- 170x realtime on GPU: Efficient for large dataset annotation
- Open source: Apache 2.0 license
Example:
from funasr import AutoModel
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc")
result = model.generate(input="audio.wav")
Feature Request
Coqui TTS is one of the best open-source TTS libraries! FunASR could be useful for training data preparation — automatic speech-to-text annotation for audio datasets.
Use case: Annotate raw audio with FunASR → get timestamped transcripts → use for TTS fine-tuning.
Why FunASR?
Example: