Skip to content

Feature Request: Add FunASR/SenseVoice as self-hosted STT option #1331

@LauraGPT

Description

@LauraGPT

Feature Request

Khoj's self-hosted architecture would benefit from FunASR/SenseVoice as a local STT option for voice input.

Why

  • Fully self-hosted — no external API calls, aligns with Khoj's privacy-first approach
  • OpenAI-compatible API — FunASR provides /v1/audio/transcriptions via funasr-server, easy to integrate
  • 50+ languages with automatic language detection
  • 170x real-time on GPU — non-autoregressive architecture, very fast
  • Emotion + audio events — SenseVoice detects speech emotion and audio events (laughter, music, etc.)

Quick Setup

pip install funasr
funasr-server --device cuda

# Same API as OpenAI Whisper
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@audio.wav" -F "model=iic/SenseVoiceSmall"

Integration

Since FunASR provides an OpenAI-compatible API, if Khoj already supports OpenAI's Whisper API for STT, pointing to a local funasr-server instance would work with minimal changes — just changing the base URL.

Happy to help with integration!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions