Feature Request: Add FunASR/SenseVoice as a self-hosted STT provider

Dograh's modular STT architecture is a great fit for adding **FunASR** as a self-hosted STT provider. This would give users a completely self-contained voice AI stack with no external API dependencies.

### Why FunASR?

- **OpenAI-compatible API**: FunASR ships with `funasr-server` that serves at `/v1/audio/transcriptions` — same API as OpenAI Whisper, so integration is minimal
- **170x realtime** on GPU, **17x on CPU** — ideal for real-time voice agent pipelines
- **50+ languages** with strong CJK support
- **Built-in VAD + punctuation** — no need for separate voice activity detection
- **Speaker diarization** — cam++ model included for multi-speaker scenarios
- **Streaming support** — Paraformer-streaming for low-latency real-time ASR

### Integration

```bash
# Start FunASR server (OpenAI-compatible)
pip install funasr
funasr-server --device cuda
# Serves at localhost:8000/v1/audio/transcriptions
```

Since Dograh already supports custom STT providers, pointing to the FunASR server endpoint should work with minimal code changes.

### Resources

- FunASR: https://github.com/modelscope/FunASR (16.6K+ stars)
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K+ stars)

This would make Dograh one of the few voice AI platforms with a truly zero-external-dependency option for STT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add FunASR/SenseVoice as a self-hosted STT provider #384

Why FunASR?

Integration

Resources

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: Add FunASR/SenseVoice as a self-hosted STT provider #384

Description

Why FunASR?

Integration

Resources

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions