Feature Request: Add FunASR/SenseVoice as self-hosted STT option

## Feature Request

Khoj's self-hosted architecture would benefit from **FunASR/SenseVoice** as a local STT option for voice input.

### Why

- **Fully self-hosted** — no external API calls, aligns with Khoj's privacy-first approach
- **OpenAI-compatible API** — FunASR provides `/v1/audio/transcriptions` via `funasr-server`, easy to integrate
- **50+ languages** with automatic language detection
- **170x real-time on GPU** — non-autoregressive architecture, very fast
- **Emotion + audio events** — SenseVoice detects speech emotion and audio events (laughter, music, etc.)

### Quick Setup

```bash
pip install funasr
funasr-server --device cuda

# Same API as OpenAI Whisper
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@audio.wav" -F "model=iic/SenseVoiceSmall"
```

### Integration

Since FunASR provides an OpenAI-compatible API, if Khoj already supports OpenAI's Whisper API for STT, pointing to a local `funasr-server` instance would work with minimal changes — just changing the base URL.

Happy to help with integration!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add FunASR/SenseVoice as self-hosted STT option #1331

Feature Request

Why

Quick Setup

Integration

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Add FunASR/SenseVoice as self-hosted STT option #1331

Description

Feature Request

Why

Quick Setup

Integration

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions