feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector#943
feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector#943kovtcharov wants to merge 2 commits intomainfrom
Conversation
Add gaia.audio.lemonade_audio — an HTTP client for Lemonade Server's
OpenAI-compatible /v1/audio/transcriptions and /v1/audio/speech endpoints.
A single Lemonade instance can now serve LLM + STT (Whisper) + TTS
(Kokoro) instead of GAIA loading separate Whisper / Kokoro models in
process — one model cache, one health story.
Wire it into the Agent UI through a new /voice router:
POST /voice/transcribe multipart upload → Lemonade → {"text": "..."}
POST /voice/speech JSON body → Lemonade → audio bytes
GET /voice/health proxy to Lemonade /api/v1/health
GET /voice/test self-contained browser harness (MediaRecorder
+ OfflineAudioContext WAV converter, no React
rebuild needed)
Body for /voice/speech is validated by a Pydantic SpeechRequest model;
upstream Lemonade failures surface as 502 with the original error detail.
The existing gaia.audio.whisper_asr / gaia.audio.kokoro_tts modules
remain unchanged — the in-process variants still serve `gaia talk` and
any other caller that does not have a Lemonade server running.
Refs #215, #373.
| <script> | ||
| const $ = (id) => document.getElementById(id); | ||
| const setStatus = (id, kind, text) => { | ||
| $(id).innerHTML = `<span class="dot ${kind}"></span>${text}`; |
SummaryCleanly factored, well-documented PR that adds an HTTP wrapper around Lemonade's OpenAI-compatible Issues🟡 Important1. Every other consumer in GAIA expects
This module instead defaults to
Strip the suffix (or origin-only the URL) before composing paths. Cleanest is to reuse 2. Sync The router endpoints are
Two options: (a) make 3.
4. No tests for the new module or router CLAUDE.md ("Testing Requirements"): every new feature requires tests. Existing pattern under
5. Documentation not updated CLAUDE.md mandates docs for every new feature, and 🟢 Minor6. The path version can delegate to the bytes version and lose ~40 lines of duplication (compare to how 7. If a test fixture or long-running process mutates 8. Cache once at module load, or mount via 9. Upstream status code is squashed in the router ( Every Lemonade failure becomes a 502 regardless of root cause. A Strengths
VerdictRequest changes — issues 1–3 are blocking (broken URLs under the documented env var, event-loop stall, missing base dep). Issues 4 & 5 (tests + docs) are CLAUDE.md requirements that should land in this PR rather than as a follow-up. Once those are addressed, the rest are quick polish — this is a well-scoped, well-documented change and I'd expect a fast turn-around. |
d1aef35 to
5d22a96
Compare
The /voice/* HTTP routes added in the previous commit hard-coded the
Lemonade backend. That works on Linux/Windows AMD Ryzen AI but breaks
local development on macOS, where Lemonade's whispercpp / Kokoro recipes
are not yet supported (per Lemonade v10.2 model registry, only llamacpp
recipes run on macOS).
Add a GAIA_VOICE_BACKEND env var so the routes work on both platforms
until macOS support lands in Lemonade itself:
GAIA_VOICE_BACKEND=lemonade (default — POSTs to /v1/audio/* on Lemonade)
GAIA_VOICE_BACKEND=in-process (falls through to gaia.audio.whisper_asr +
gaia.audio.kokoro_tts; works on macOS)
Implementation:
- /voice/transcribe dispatches to either transcribe_bytes() or a
temp-WAV → WhisperAsr.transcribe_file() → cleanup
- /voice/speech dispatches to either synthesize_bytes() or
KokoroTTS().generate_speech() → soundfile-encoded WAV (no MP3
encoder bundled with the in-process path)
- /voice/health reports the active backend and its readiness
- Lemonade STT model names (Whisper-Small) are mapped to
openai-whisper-package names (small) when in-process is selected
- No silent fallback between backends — if the selected backend's deps
or service are unavailable, the route returns a clear error (503
for missing deps, 502 for upstream failures)
Voice test page (/voice/test) now displays the active backend in its
header, sourced from /voice/health.
In-process and Lemonade-routed audio coexist until Lemonade adds macOS
support for its audio recipes.
| return { | ||
| "backend": _IN_PROCESS, | ||
| "ready": deps_ok, | ||
| "detail": detail, | ||
| "stt_default": "small", | ||
| "tts_default": "af_bella", | ||
| } |
SummarySolid additive PR that introduces a Lemonade-routed audio path alongside the existing in-process Whisper/Kokoro modules, gated by Issues Found🟡 Important — No tests for the new audio path (
|
Summary
Add a Lemonade-routed audio path to GAIA's UI server, alongside the existing in-process Whisper/Kokoro modules. Both backends ship together — selectable per-request via the
GAIA_VOICE_BACKENDenv var — until Lemonade adds macOS support for itswhispercppand Kokoro recipes.This unblocks the deployment story (Lemonade serves LLM + STT + TTS on Ryzen AI / Linux + Windows) without breaking local development on macOS, where Lemonade's audio recipes don't run yet.
Refs #215 (switch ASR to Lemonade audio API) and #373 (server-side TTS via Lemonade) — partial fix; closes neither because the in-process path is preserved on purpose.
Two stacked commits
5d22a96feat(audio): add Lemonade-routed STT/TTS endpoints + browser test pagee40773cfeat(audio): GAIA_VOICE_BACKEND env var to pick lemonade vs in-processWhat's added
Backend wrappers — additive, no replacements
src/gaia/audio/lemonade_audio.py(new) — HTTP client primitives:transcribe(),transcribe_bytes(),synthesize(),synthesize_bytes(),lemonade_health(). Targets/v1/audio/transcriptions,/v1/audio/speech, and/api/v1/health(matching the path the existinggaia.llm.lemonade_client.LemonadeClientuses). Per the no-silent-fallback policy, all failures raiseLemonadeAudioErrorwith an actionable message.gaia.audio.whisper_asr/gaia.audio.kokoro_tts(unchanged) — the in-process Whisper and Kokoro classes are deliberately preserved. Existing callers (gaia talkviagaia.cli,gaia.audio.audio_client,gaia.agents.chat.agent) keep working with no code changes.HTTP router with dual-backend dispatch
src/gaia/ui/routers/audio.py(new) — FastAPI router exposing four routes under/voice/*:/voice/transcribe{"text", "model", "backend"}/voice/speech/voice/health/voice/testEach request dispatches based on
GAIA_VOICE_BACKEND(defaultlemonade):lemonade— POSTs to Lemonade's/v1/audio/*endpointsin-process— falls through toWhisperAsr.transcribe_file()/KokoroTTS.generate_speech()Lemonade STT model names (
Whisper-Small) are mapped to openai-whisper-package names (small) automatically whenin-processis selected. No silent fallback between backends — if the selected backend's deps or service are unavailable, the route returns 503 (missing deps) or 502 (upstream failure) with the original error message intact.Browser test harness
src/gaia/ui/static/voice_test.html(new) — single-file mic + TTS test page. Browser-sideMediaRecorder+OfflineAudioContextdownsamples to 16 kHz mono WAV before upload (Lemonade requirement). Reports the active backend in its header, sourced from/voice/health. No JS bundle, no React rebuild.Wheel packaging
setup.py— registergaia.ui/static/*.htmlinpackage_dataso wheels shipvoice_test.html.MANIFEST.in— backstoprecursive-include(matches the existinggaia.apps.webuipattern).UI server registration
src/gaia/ui/server.py— +1 import, +1app.include_router(audio_router_mod.router). Sits alongside the other 8 routers; no behavior change for existing routes.Why both backends ship together
Lemonade v10.2's model registry only includes
llamacpprecipes on macOS —whispercppand Kokoro are Linux/Windows-only. A single-backend approach forces a choice between:gaia talkcan't run)The dual-backend approach removes the trade-off: developers on macOS export
GAIA_VOICE_BACKEND=in-processand get the legacy path; production on Ryzen AI / Linux uses the defaultlemonadeand gets the unified server. When Lemonade adds macOS support for the audio recipes, the in-process modules can be deprecated in a follow-up PR.Diff stat
No deletions of behavior. No public-API changes to
WhisperAsrorKokoroTTS.Test plan
python -m py_compileon all 8 changed/new filesfrom gaia.ui.routers.audio import routerresolves; lists 4 routescurl http://localhost:13305/api/v1/health→ 200 (matches the path the wrapper uses)curl -X POST http://localhost:13305/v1/audio/transcriptions ...returns structured JSON. On macOS the response is amodel_not_supported404 (Lemonade'swhispercpprecipe is Linux/Windows-only) — confirms our wrapper's error path passes the upstream message through cleanly.gaia.audio.whisper_asrandgaia.audio.kokoro_ttsuntouched —git diff origin/main -- src/gaia/audio/whisper_asr.py src/gaia/audio/kokoro_tts.pyis empty.pip install -e .on this branchlemonade-server serve &gaia chat --ui &then openhttp://localhost:<ui-port>/voice/testActive backend: lemonade/voice/health→ greenpip install -e ".[talk]"(pulls openai-whisper + kokoro)GAIA_VOICE_BACKEND=in-process gaia chat --ui &http://localhost:<ui-port>/voice/testActive backend: in-process/voice/healthreportsready: true(orfalsewith the missing-dep message)gaia talksmoke test on macOS — verifies the legacy in-process callers still work after the additive changesgaia.agents.chat.agentTTS still produces playable WAVpython -m buildactually containsstatic/voice_test.html(suggest extendingutil/verify_wheel_dist.pyto assert this)python util/lint.py --allcleanFirst-time-use note
The first request to either Lemonade audio endpoint triggers a one-time model download (~30 s for
Whisper-Small, similar forkokoro-v1). The wrapper's defaulttimeoutis 60 s so this doesn't race; pre-warming viacurlbefore any timed test is recommended.Follow-ups (not in this PR)
/realtimeWebSocket endpoint (currentlyWhisperAsr.start_recording_streamingchunks the recording in-process — same as before this PR).llamacpp-based Whisper / Kokoro, or a Metal-native variant), deprecate the in-process modules and[talk]extras' heavyopenai-whisper/kokorodeps in a follow-up.util/verify_wheel_dist.pyto assertgaia/ui/static/voice_test.htmlis in the dist.