Skip to content

Feature Request: Add SenseVoice/FunASR as STT provider #390

@LauraGPT

Description

@LauraGPT

Feature Request

Dograh is building a great open-source voice AI platform. Suggesting SenseVoice / FunASR as an additional STT provider option.

Why SenseVoice for voice AI?

  • 5x faster than Whisper — non-autoregressive architecture, critical for real-time voice agents
  • 50+ languages in a single 234M param model
  • Emotion detection — identifies speaker emotions (happy, angry, sad), useful for sentiment-aware agents
  • Audio events — detects laughter, applause, music, background noise
  • OpenAI-compatible APIfunasr-server serves /v1/audio/transcriptions, easy integration
  • Streaming support — WebSocket-based real-time streaming with partial results

Self-hosted advantage

FunASR runs entirely locally — perfect for self-hosted voice AI:

pip install funasr vllm
funasr-server --device cuda  # OpenAI-compatible /v1/audio/transcriptions

Or integrate directly:

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_bytes)

Resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions