Feature Request
ElatoAI does impressive work with real-time voice AI on ESP32. Suggesting SenseVoice as an ASR backend option — it's particularly well-suited for low-latency voice applications.
Why SenseVoice for edge voice AI?
- Non-autoregressive — constant, predictable latency (no sequential decoding)
- 5x faster than Whisper — critical when every ms matters for natural conversation
- 234M params — efficient for server-side processing of audio from ESP32
- 50+ languages — single model handles multilingual input
- Emotion detection — could enable emotion-aware responses from the AI
Server-side integration
For ESP32 → server architecture, SenseVoice can run server-side with an OpenAI-compatible API:
pip install funasr
funasr-server --device cuda # /v1/audio/transcriptions endpoint
The low latency of SenseVoice (non-autoregressive) combined with efficient WebSocket streaming makes it ideal for the real-time voice interaction ElatoAI provides.
Edge deployment option
For future on-device use, SenseVoice is available via Sherpa-ONNX which supports embedded platforms including ESP32-S3 (though memory-constrained).
Feature Request
ElatoAI does impressive work with real-time voice AI on ESP32. Suggesting SenseVoice as an ASR backend option — it's particularly well-suited for low-latency voice applications.
Why SenseVoice for edge voice AI?
Server-side integration
For ESP32 → server architecture, SenseVoice can run server-side with an OpenAI-compatible API:
pip install funasr funasr-server --device cuda # /v1/audio/transcriptions endpointThe low latency of SenseVoice (non-autoregressive) combined with efficient WebSocket streaming makes it ideal for the real-time voice interaction ElatoAI provides.
Edge deployment option
For future on-device use, SenseVoice is available via Sherpa-ONNX which supports embedded platforms including ESP32-S3 (though memory-constrained).