Feature Request
Khoj's self-hosted architecture would benefit from FunASR/SenseVoice as a local STT option for voice input.
Why
- Fully self-hosted — no external API calls, aligns with Khoj's privacy-first approach
- OpenAI-compatible API — FunASR provides
/v1/audio/transcriptions via funasr-server, easy to integrate
- 50+ languages with automatic language detection
- 170x real-time on GPU — non-autoregressive architecture, very fast
- Emotion + audio events — SenseVoice detects speech emotion and audio events (laughter, music, etc.)
Quick Setup
pip install funasr
funasr-server --device cuda
# Same API as OpenAI Whisper
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@audio.wav" -F "model=iic/SenseVoiceSmall"
Integration
Since FunASR provides an OpenAI-compatible API, if Khoj already supports OpenAI's Whisper API for STT, pointing to a local funasr-server instance would work with minimal changes — just changing the base URL.
Happy to help with integration!
Feature Request
Khoj's self-hosted architecture would benefit from FunASR/SenseVoice as a local STT option for voice input.
Why
/v1/audio/transcriptionsviafunasr-server, easy to integrateQuick Setup
Integration
Since FunASR provides an OpenAI-compatible API, if Khoj already supports OpenAI's Whisper API for STT, pointing to a local
funasr-serverinstance would work with minimal changes — just changing the base URL.Happy to help with integration!