Skip to content

Feature Request: Add FunASR/SenseVoice as a self-hosted STT provider #384

@LauraGPT

Description

@LauraGPT

Dograh's modular STT architecture is a great fit for adding FunASR as a self-hosted STT provider. This would give users a completely self-contained voice AI stack with no external API dependencies.

Why FunASR?

  • OpenAI-compatible API: FunASR ships with funasr-server that serves at /v1/audio/transcriptions — same API as OpenAI Whisper, so integration is minimal
  • 170x realtime on GPU, 17x on CPU — ideal for real-time voice agent pipelines
  • 50+ languages with strong CJK support
  • Built-in VAD + punctuation — no need for separate voice activity detection
  • Speaker diarization — cam++ model included for multi-speaker scenarios
  • Streaming support — Paraformer-streaming for low-latency real-time ASR

Integration

# Start FunASR server (OpenAI-compatible)
pip install funasr
funasr-server --device cuda
# Serves at localhost:8000/v1/audio/transcriptions

Since Dograh already supports custom STT providers, pointing to the FunASR server endpoint should work with minimal code changes.

Resources

This would make Dograh one of the few voice AI platforms with a truly zero-external-dependency option for STT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions