Skip to content

Feature Request: Add FunASR for training data annotation #4422

@LauraGPT

Description

@LauraGPT

Feature Request

Coqui TTS is one of the best open-source TTS libraries! FunASR could be useful for training data preparation — automatic speech-to-text annotation for audio datasets.

Use case: Annotate raw audio with FunASR → get timestamped transcripts → use for TTS fine-tuning.

Why FunASR?

  • SenseVoice: 50+ languages, matching Coqui TTS's multilingual ambition
  • Paraformer: Character-level timestamps for precise audio-text alignment
  • Built-in VAD + punctuation: One-call pipeline
  • 170x realtime on GPU: Efficient for large dataset annotation
  • Open source: Apache 2.0 license

Example:

from funasr import AutoModel
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc")
result = model.generate(input="audio.wav")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions