Skip to content

feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector#943

Open
kovtcharov wants to merge 2 commits intomainfrom
feat/lemonade-audio-endpoints
Open

feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector#943
kovtcharov wants to merge 2 commits intomainfrom
feat/lemonade-audio-endpoints

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

@kovtcharov kovtcharov commented May 2, 2026

Summary

Add a Lemonade-routed audio path to GAIA's UI server, alongside the existing in-process Whisper/Kokoro modules. Both backends ship together — selectable per-request via the GAIA_VOICE_BACKEND env var — until Lemonade adds macOS support for its whispercpp and Kokoro recipes.

This unblocks the deployment story (Lemonade serves LLM + STT + TTS on Ryzen AI / Linux + Windows) without breaking local development on macOS, where Lemonade's audio recipes don't run yet.

Refs #215 (switch ASR to Lemonade audio API) and #373 (server-side TTS via Lemonade) — partial fix; closes neither because the in-process path is preserved on purpose.

Two stacked commits

  • 5d22a96 feat(audio): add Lemonade-routed STT/TTS endpoints + browser test page
  • e40773c feat(audio): GAIA_VOICE_BACKEND env var to pick lemonade vs in-process

What's added

Backend wrappers — additive, no replacements

  • src/gaia/audio/lemonade_audio.py (new) — HTTP client primitives: transcribe(), transcribe_bytes(), synthesize(), synthesize_bytes(), lemonade_health(). Targets /v1/audio/transcriptions, /v1/audio/speech, and /api/v1/health (matching the path the existing gaia.llm.lemonade_client.LemonadeClient uses). Per the no-silent-fallback policy, all failures raise LemonadeAudioError with an actionable message.
  • gaia.audio.whisper_asr / gaia.audio.kokoro_tts (unchanged) — the in-process Whisper and Kokoro classes are deliberately preserved. Existing callers (gaia talk via gaia.cli, gaia.audio.audio_client, gaia.agents.chat.agent) keep working with no code changes.

HTTP router with dual-backend dispatch

  • src/gaia/ui/routers/audio.py (new) — FastAPI router exposing four routes under /voice/*:

    Method Path Purpose
    POST /voice/transcribe multipart upload → STT → {"text", "model", "backend"}
    POST /voice/speech Pydantic-validated JSON → TTS → audio bytes
    GET /voice/health reports active backend + its readiness
    GET /voice/test self-contained HTML harness

    Each request dispatches based on GAIA_VOICE_BACKEND (default lemonade):

    • lemonade — POSTs to Lemonade's /v1/audio/* endpoints
    • in-process — falls through to WhisperAsr.transcribe_file() / KokoroTTS.generate_speech()

    Lemonade STT model names (Whisper-Small) are mapped to openai-whisper-package names (small) automatically when in-process is selected. No silent fallback between backends — if the selected backend's deps or service are unavailable, the route returns 503 (missing deps) or 502 (upstream failure) with the original error message intact.

Browser test harness

  • src/gaia/ui/static/voice_test.html (new) — single-file mic + TTS test page. Browser-side MediaRecorder + OfflineAudioContext downsamples to 16 kHz mono WAV before upload (Lemonade requirement). Reports the active backend in its header, sourced from /voice/health. No JS bundle, no React rebuild.

Wheel packaging

  • setup.py — register gaia.ui/static/*.html in package_data so wheels ship voice_test.html.
  • MANIFEST.in — backstop recursive-include (matches the existing gaia.apps.webui pattern).

UI server registration

  • src/gaia/ui/server.py — +1 import, +1 app.include_router(audio_router_mod.router). Sits alongside the other 8 routers; no behavior change for existing routes.

Why both backends ship together

Lemonade v10.2's model registry only includes llamacpp recipes on macOS — whispercpp and Kokoro are Linux/Windows-only. A single-backend approach forces a choice between:

  • Replace with Lemonade only → breaks local Mac dev (the test page can't transcribe, gaia talk can't run)
  • Keep in-process only → blocks the deployment story (LLM and audio served by separate model loaders, double the install footprint)

The dual-backend approach removes the trade-off: developers on macOS export GAIA_VOICE_BACKEND=in-process and get the legacy path; production on Ryzen AI / Linux uses the default lemonade and gets the unified server. When Lemonade adds macOS support for the audio recipes, the in-process modules can be deprecated in a follow-up PR.

Diff stat

8 files changed, 909 insertions(+), 17 deletions(-)

No deletions of behavior. No public-API changes to WhisperAsr or KokoroTTS.

Test plan

  • python -m py_compile on all 8 changed/new files
  • from gaia.ui.routers.audio import router resolves; lists 4 routes
  • curl http://localhost:13305/api/v1/health → 200 (matches the path the wrapper uses)
  • curl -X POST http://localhost:13305/v1/audio/transcriptions ... returns structured JSON. On macOS the response is a model_not_supported 404 (Lemonade's whispercpp recipe is Linux/Windows-only) — confirms our wrapper's error path passes the upstream message through cleanly.
  • gaia.audio.whisper_asr and gaia.audio.kokoro_tts untouched — git diff origin/main -- src/gaia/audio/whisper_asr.py src/gaia/audio/kokoro_tts.py is empty.
  • Lemonade backend round-trip on a Linux/Windows AMD Ryzen AI machine. Steps:
    1. pip install -e . on this branch
    2. lemonade-server serve &
    3. gaia chat --ui & then open http://localhost:<ui-port>/voice/test
    4. Header should show Active backend: lemonade
    5. Click ▶ Probe /voice/health → green
    6. Click 🎤 Record, speak ~5s, stop → transcription appears
    7. Type something, click ▶ Synthesize → audio plays
  • In-process backend smoke test on macOS. Steps:
    1. pip install -e ".[talk]" (pulls openai-whisper + kokoro)
    2. GAIA_VOICE_BACKEND=in-process gaia chat --ui &
    3. Open http://localhost:<ui-port>/voice/test
    4. Header should show Active backend: in-process
    5. /voice/health reports ready: true (or false with the missing-dep message)
    6. Mic record → transcription appears (Whisper running locally)
    7. TTS → WAV audio plays (Kokoro running locally; MP3 not supported in this path)
  • gaia talk smoke test on macOS — verifies the legacy in-process callers still work after the additive changes
  • Chat agent voice smoke test — verifies gaia.agents.chat.agent TTS still produces playable WAV
  • Wheel built with python -m build actually contains static/voice_test.html (suggest extending util/verify_wheel_dist.py to assert this)
  • python util/lint.py --all clean

First-time-use note

The first request to either Lemonade audio endpoint triggers a one-time model download (~30 s for Whisper-Small, similar for kokoro-v1). The wrapper's default timeout is 60 s so this doesn't race; pre-warming via curl before any timed test is recommended.

Follow-ups (not in this PR)

  • True real-time streaming via Lemonade's /realtime WebSocket endpoint (currently WhisperAsr.start_recording_streaming chunks the recording in-process — same as before this PR).
  • Once Lemonade ships macOS-compatible audio recipes (llamacpp-based Whisper / Kokoro, or a Metal-native variant), deprecate the in-process modules and [talk] extras' heavy openai-whisper / kokoro deps in a follow-up.
  • Wheel-content CI check: extend util/verify_wheel_dist.py to assert gaia/ui/static/voice_test.html is in the dist.

Add gaia.audio.lemonade_audio — an HTTP client for Lemonade Server's
OpenAI-compatible /v1/audio/transcriptions and /v1/audio/speech endpoints.
A single Lemonade instance can now serve LLM + STT (Whisper) + TTS
(Kokoro) instead of GAIA loading separate Whisper / Kokoro models in
process — one model cache, one health story.

Wire it into the Agent UI through a new /voice router:
  POST /voice/transcribe   multipart upload → Lemonade → {"text": "..."}
  POST /voice/speech       JSON body → Lemonade → audio bytes
  GET  /voice/health       proxy to Lemonade /api/v1/health
  GET  /voice/test         self-contained browser harness (MediaRecorder
                           + OfflineAudioContext WAV converter, no React
                           rebuild needed)

Body for /voice/speech is validated by a Pydantic SpeechRequest model;
upstream Lemonade failures surface as 502 with the original error detail.

The existing gaia.audio.whisper_asr / gaia.audio.kokoro_tts modules
remain unchanged — the in-process variants still serve `gaia talk` and
any other caller that does not have a Lemonade server running.

Refs #215, #373.
@kovtcharov kovtcharov requested a review from kovtcharov-amd as a code owner May 2, 2026 23:09
@github-actions github-actions Bot added dependencies Dependency updates audio Audio (ASR/TTS) changes labels May 2, 2026
<script>
const $ = (id) => document.getElementById(id);
const setStatus = (id, kind, text) => {
$(id).innerHTML = `<span class="dot ${kind}"></span>${text}`;
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Summary

Cleanly factored, well-documented PR that adds an HTTP wrapper around Lemonade's OpenAI-compatible /v1/audio/* endpoints plus a /voice/* FastAPI router and a single-file browser harness. Scope is tightly contained, the no-silent-fallback policy is followed faithfully, and the wheel packaging is wired up. The single most important thing to fix before merge is the LEMONADE_BASE_URL convention mismatch — the rest of GAIA uses a base URL that includes /api/v1, this module assumes it doesn't, and a user with the documented env var set will get broken URLs like /api/v1/v1/audio/transcriptions. There are also a couple of secondary concerns (sync HTTP inside async endpoints, httpx not in base deps, missing tests/docs) flagged below.

Issues

🟡 Important

1. LEMONADE_BASE_URL convention mismatch breaks audio when the env var is set (src/gaia/audio/lemonade_audio.py:38)

Every other consumer in GAIA expects LEMONADE_BASE_URL to include /api/v1:

  • src/gaia/llm/lemonade_client.py:54-56DEFAULT_LEMONADE_URL = f"http://{DEFAULT_HOST}:{DEFAULT_PORT}/api/{LEMONADE_API_VERSION}"
  • src/gaia/agents/base/agent.py:199, src/gaia/agents/chat/agent.py:156, src/gaia/agents/code/cli.py:110, src/gaia/agents/builder/agent.py:133, src/gaia/agents/registry.py:600, src/gaia/agents/routing/agent.py:59, src/gaia/ui/routers/system.py:52, src/gaia/ui/_chat_helpers.py:1846 — all default to http://localhost:13305/api/v1.

This module instead defaults to http://localhost:13305 (no suffix) and constructs f"{base_url}/v1/audio/..." and f"{base_url}/api/v1/health". If a user has the documented env var set (the GAIA-wide convention), every URL produced here is broken:

  • transcribehttp://localhost:13305/api/v1/v1/audio/transcriptions
  • synthesize_byteshttp://localhost:13305/api/v1/v1/audio/speech
  • lemonade_healthhttp://localhost:13305/api/v1/api/v1/health

Strip the suffix (or origin-only the URL) before composing paths. Cleanest is to reuse gaia.llm.lemonade_client._get_lemonade_config so there's one source of truth. Inline option:

from urllib.parse import urlparse


def _lemonade_origin() -> str:
    """Return the scheme+host[:port] of LEMONADE_BASE_URL (strips any /api/v1 suffix).

    The rest of GAIA sets ``LEMONADE_BASE_URL=http://host:port/api/v1`` (the
    Lemonade native namespace). The OpenAI-compatible audio endpoints live at
    ``/v1/audio/*`` — *not* under ``/api/v1`` — so we resolve to the origin and
    let the caller append the right path.
    """
    raw = os.getenv("LEMONADE_BASE_URL", "http://localhost:13305")
    p = urlparse(raw)
    if not p.scheme or not p.netloc:
        return "http://localhost:13305"
    return f"{p.scheme}://{p.netloc}"


LEMONADE_URL = _lemonade_origin()

2. Sync httpx inside async def endpoints blocks the event loop (src/gaia/ui/routers/audio.py:39,60,84 via lemonade_audio.py:60,98,160,225)

The router endpoints are async def, but the wrapper calls synchronous httpx.post/httpx.get. STT round-trips take seconds (and the first call can be ~30s when Lemonade auto-downloads a model), and the entire FastAPI worker stalls for that duration — every other route on the UI server backs up behind one transcription. The existing pattern in this codebase is httpx.AsyncClient:

  • src/gaia/ui/routers/system.py:67,244,411,720 — all use async with httpx.AsyncClient(...).

Two options: (a) make lemonade_audio async-first and have the router await it, or (b) keep the sync API for general callers and await asyncio.to_thread(transcribe_bytes, ...) from inside the router. Option (a) matches the rest of the UI; option (b) preserves a sync callable for non-async consumers (the "Beacon downstream consumer" use case mentioned in the module docstring).

3. httpx is not in base install_requires (setup.py:112-123, setup.py:137)

httpx is declared only in the [ui] extra, but gaia.audio.lemonade_audio lives in the base package and imports it unconditionally. A user who installs gaia without [ui] will get ModuleNotFoundError: No module named 'httpx' on first import — including any "downstream consumer like Beacon" the module docstring is written for. Either:

  • Add httpx>=0.27.0 to install_requires, or
  • Replace httpx with requests (already in install_requires) — less consistent with the rest of gaia.ui but keeps base deps small.

4. No tests for the new module or router

CLAUDE.md ("Testing Requirements"): every new feature requires tests. Existing pattern under tests/unit/test_audio_*.py shows where these belong. Suggested minimum:

  • tests/unit/test_lemonade_audio.py — mock httpx and assert URL construction (catches issue Update installer and workflows/actions for CI/CD #1), the {"text": null}"" normalization (lemonade_audio.py:99-100, lemonade_audio.py:144-145), and that all error paths raise LemonadeAudioError with from e chaining preserved.
  • tests/unit/test_audio_router.pyfastapi.testclient.TestClient, assert 400 on empty upload, 502 on LemonadeAudioError, correct media-type per response_format, and that /voice/test returns 200 with text/html.

5. Documentation not updated

CLAUDE.md mandates docs for every new feature, and docs/sdk/sdks/audio.mdx already exists but isn't touched here. The /voice/* HTTP routes are also a public surface that would benefit from an entry in docs/reference/ (or a section in the audio SDK page) so contributors can discover the test harness URL without reading the router source. The PR's own test plan still has [ ] mypy / ruff pass on the new modules unchecked — please run python util/lint.py --all before merge.

🟢 Minor

6. transcribe() and transcribe_bytes() are near-duplicates (lemonade_audio.py:57-101 vs 103-156)

The path version can delegate to the bytes version and lose ~40 lines of duplication (compare to how synthesize already delegates to synthesize_bytes at lemonade_audio.py:194-204):

def transcribe(
    audio_path: str | Path,
    *,
    model: str = DEFAULT_STT_MODEL,
    language: str | None = "en",
    base_url: str = LEMONADE_URL,
    timeout: float = 60.0,
) -> str:
    """POST a WAV file to Lemonade /v1/audio/transcriptions.

    See :func:`transcribe_bytes` for argument and error semantics.
    """
    audio_path = Path(audio_path)
    if not audio_path.exists():
        raise FileNotFoundError(audio_path)
    return transcribe_bytes(
        audio_path.read_bytes(),
        filename=audio_path.name,
        model=model,
        language=language,
        base_url=base_url,
        timeout=timeout,
    )

7. LEMONADE_URL resolved at import time (lemonade_audio.py:38)

If a test fixture or long-running process mutates LEMONADE_BASE_URL after import, the change isn't picked up. The function-based pattern in gaia.llm.lemonade_client._get_lemonade_config avoids this. Goes hand-in-hand with the suggestion in issue #1.

8. voice/test reads HTML from disk on every request (src/gaia/ui/routers/audio.py:101)

Cache once at module load, or mount via StaticFiles (the pattern used at src/gaia/ui/server.py:407,423):

_TEST_HTML_PATH = Path(__file__).parent.parent / "static" / "voice_test.html"
try:
    _TEST_HTML = _TEST_HTML_PATH.read_text(encoding="utf-8")
except FileNotFoundError:
    _TEST_HTML = None  # surfaced as 500 in the handler below


@router.get("/test", response_class=HTMLResponse)
def voice_test_page():
    """Serve a single-page browser harness for STT + TTS smoke testing.

    Open in a browser at http://localhost:<ui-port>/voice/test
    """
    if _TEST_HTML is None:
        raise HTTPException(
            status_code=500,
            detail=(
                f"voice_test.html missing at {_TEST_HTML_PATH}. "
                "Reinstall gaia or restore src/gaia/ui/static/voice_test.html."
            ),
        )
    return HTMLResponse(_TEST_HTML)

9. Upstream status code is squashed in the router (src/gaia/ui/routers/audio.py:39-49)

Every Lemonade failure becomes a 502 regardless of root cause. A model_not_supported 404 (the macOS case the PR description calls out), an upload-format 400, and a connection timeout all look the same to the client. Not blocking — LemonadeAudioError's message text is preserved in detail — but if you have time, capture the upstream status_code on the exception and pass it through.

Strengths

  • No-silent-fallback discipline is exemplary. LemonadeAudioError messages at lemonade_audio.py:79-83, 92-96, 139-143 cleanly satisfy CLAUDE.md's "what failed / what to do / where to look" rule, and the docstring on the exception explicitly documents why there's no fallback to in-process whisper_asr / kokoro_tts. That rationale is exactly the right thing to write down.
  • PR description is the right shape: leads with why, endpoint table, an explicit "What's not touched" scope-fence, a hardware-tied test plan with honest unchecked boxes, and a first-time-use note about the model auto-download. Easy to review.
  • Browser harness is genuinely useful — single-file MediaRecorder + OfflineAudioContext WAV converter (voice_test.html:226-261) means contributors can validate the audio path without curl --form gymnastics, and there's no React rebuild in the loop. Wheel packaging via MANIFEST.in + setup.py package_data["gaia.ui"] mirrors the existing webui pattern correctly.

Verdict

Request changes — issues 1–3 are blocking (broken URLs under the documented env var, event-loop stall, missing base dep). Issues 4 & 5 (tests + docs) are CLAUDE.md requirements that should land in this PR rather than as a follow-up. Once those are addressed, the rest are quick polish — this is a well-scoped, well-documented change and I'd expect a fast turn-around.

@kovtcharov kovtcharov changed the title feat(audio): Lemonade-routed STT/TTS endpoints + browser test page feat(audio): unify STT/TTS on Lemonade Server (replace in-process Whisper/Kokoro) May 3, 2026
@kovtcharov kovtcharov force-pushed the feat/lemonade-audio-endpoints branch from d1aef35 to 5d22a96 Compare May 3, 2026 01:11
The /voice/* HTTP routes added in the previous commit hard-coded the
Lemonade backend. That works on Linux/Windows AMD Ryzen AI but breaks
local development on macOS, where Lemonade's whispercpp / Kokoro recipes
are not yet supported (per Lemonade v10.2 model registry, only llamacpp
recipes run on macOS).

Add a GAIA_VOICE_BACKEND env var so the routes work on both platforms
until macOS support lands in Lemonade itself:

  GAIA_VOICE_BACKEND=lemonade   (default — POSTs to /v1/audio/* on Lemonade)
  GAIA_VOICE_BACKEND=in-process (falls through to gaia.audio.whisper_asr +
                                 gaia.audio.kokoro_tts; works on macOS)

Implementation:
  - /voice/transcribe dispatches to either transcribe_bytes() or a
    temp-WAV → WhisperAsr.transcribe_file() → cleanup
  - /voice/speech dispatches to either synthesize_bytes() or
    KokoroTTS().generate_speech() → soundfile-encoded WAV (no MP3
    encoder bundled with the in-process path)
  - /voice/health reports the active backend and its readiness
  - Lemonade STT model names (Whisper-Small) are mapped to
    openai-whisper-package names (small) when in-process is selected
  - No silent fallback between backends — if the selected backend's deps
    or service are unavailable, the route returns a clear error (503
    for missing deps, 502 for upstream failures)

Voice test page (/voice/test) now displays the active backend in its
header, sourced from /voice/health.

In-process and Lemonade-routed audio coexist until Lemonade adds macOS
support for its audio recipes.
@kovtcharov kovtcharov changed the title feat(audio): unify STT/TTS on Lemonade Server (replace in-process Whisper/Kokoro) feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector May 3, 2026
Comment on lines +259 to +265
return {
"backend": _IN_PROCESS,
"ready": deps_ok,
"detail": detail,
"stt_default": "small",
"tts_default": "af_bella",
}
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Summary

Solid additive PR that introduces a Lemonade-routed audio path alongside the existing in-process Whisper/Kokoro modules, gated by GAIA_VOICE_BACKEND. The diff is genuinely additive (zero deletions, no public-API changes), the no-silent-fallback policy is honored throughout with actionable error messages, and the dual-backend rationale is clearly documented. Main gaps: no tests for the new endpoints and no doc updates (docs/sdk/sdks/audio.mdx, docs/guides/talk.mdx) covering /voice/* or GAIA_VOICE_BACKEND — both required by CLAUDE.md for new features.

Issues Found

🟡 Important — No tests for the new audio path (tests/)

CLAUDE.md requires unit/integration tests for new features. None of lemonade_audio.py, routers/audio.py, or the /voice/* route shapes are covered. At minimum:

  • Unit test for lemonade_audio.transcribe_bytes/synthesize_bytes happy path + LemonadeAudioError translation, mocking httpx.post (similar pattern to tests/test_lemonade_client.py).
  • Router test that 502 is returned when LemonadeAudioError is raised, 400 on empty body, 503 on missing in-process deps, and that /voice/health reports the active backend.
  • from gaia.ui.routers.audio import router smoke check + route count assertion (mirrors the manual check from the PR description's test plan).

Without these the route shapes can silently drift; the PR description's test plan items are unchecked because they require Linux/Windows + macOS hardware, which is exactly why automated coverage matters.

🟡 Important — Docs not updated for new endpoints / env var

CLAUDE.md "Documentation Requirements" mandates docs for every new feature. The PR adds three public HTTP routes and a new env var (GAIA_VOICE_BACKEND) but neither docs/sdk/sdks/audio.mdx nor docs/guides/talk.mdx mention any of them. Reviewers/users have no entry point to discover /voice/test or how to flip the backend on macOS.

Suggest a short section in docs/sdk/sdks/audio.mdx listing the four routes + the env-var contract, plus a one-paragraph "On macOS, set GAIA_VOICE_BACKEND=in-process" callout in docs/guides/talk.mdx.

🟡 Important — Wheel verifier not extended for voice_test.html

The PR's own test plan flags this: util/verify_wheel_dist.py currently only asserts gaia/apps/webui/dist/. The new package_data entry for gaia.ui/static/*.html is a single point of failure (recursive globs in package_data are notoriously fragile across setuptools versions, which is exactly why the existing webui block has the verifier backstop). Without an assertion, a future setuptools change silently strips voice_test.html from the wheel and /voice/test 500s in production.

Add voice_test.html to the verifier in this PR rather than leaving it as a follow-up — the regression risk is the reason the verifier exists.

🟢 Minor — transcribe duplicates transcribe_bytes (src/gaia/audio/lemonade_audio.py:94-155, 158-199)

The two functions share ~30 lines of identical HTTP code. transcribe should delegate to transcribe_bytes:

def transcribe(
    audio_path: str | Path,
    *,
    model: str = DEFAULT_STT_MODEL,
    language: str | None = "en",
    base_url: str = LEMONADE_URL,
    timeout: float = 60.0,
) -> str:
    """POST a WAV file to Lemonade /v1/audio/transcriptions.

    Args:
        audio_path: path to a 16kHz mono WAV file (push-to-talk recordings).
        model: ``Whisper-Tiny`` | ``Whisper-Base`` | ``Whisper-Small`` |
               ``Whisper-Large`` (or any other Whisper variant Lemonade serves).
        language: ISO 639-1 code; defaults to ``"en"``. Pass ``None`` to
                  auto-detect.
        base_url: Lemonade server URL.
        timeout: HTTP timeout in seconds.

    Returns:
        The transcribed text.

    Raises:
        FileNotFoundError: if ``audio_path`` does not exist.
        LemonadeAudioError: server unreachable, non-200 status, or malformed
                            response.
    """
    audio_path = Path(audio_path)
    if not audio_path.exists():
        raise FileNotFoundError(audio_path)
    return transcribe_bytes(
        audio_path.read_bytes(),
        filename=audio_path.name,
        model=model,
        language=language,
        base_url=base_url,
        timeout=timeout,
    )

This already matches the pattern used between synthesize and synthesize_bytes further down the same file — the duplication looks like an oversight.

🟢 Minor — Inconsistent error translation in in-process STT path (src/gaia/ui/routers/audio.py:435-461)

The Lemonade path translates LemonadeAudioError → 502. The in-process path only catches FileNotFoundError (400) and ImportError (503); anything else from WhisperAsr.transcribe_file (audio decode error, runtime model failure) propagates as a bare 500 with no actionable detail. Consider mirroring the TTS path's except Exception → 502 (already used at audio.py:523) for parity:

        try:
            asr = WhisperAsr(model_size=package_name)
            text = asr.transcribe_file(tmp_path)
        except FileNotFoundError as e:
            raise HTTPException(status_code=400, detail=str(e)) from e
        except ImportError as e:
            raise HTTPException(status_code=503, detail=str(e)) from e
        except Exception as e:  # noqa: BLE001 — Whisper raises various errors
            raise HTTPException(status_code=502, detail=f"in-process STT failed: {e}") from e
        finally:

🟢 Minor — _to_whisper_package_name fallback silently lowercases unknown names (src/gaia/ui/routers/audio.py:409-411)

return _WHISPER_MODEL_TO_PACKAGE.get(name, name.lower()) means an unknown model name (e.g. a typo like Whisper-Smal) gets passed straight to openai-whisper as whisper-smal, producing a deep-stack error rather than the kind of actionable message the rest of this module favors. Consider raising HTTPException(400, ...) for unknown names, or at least logging a warning so the caller learns the value didn't map.

🟢 Minor — No upload-size guard on /voice/transcribe

audio.read() slurps the entire upload into memory before any backend dispatch. The UI server is local-only by default, but the tunnel router can expose it; a multi-GB upload would happily OOM the process. A quick Content-Length check (or len(audio_bytes) > MAX_UPLOAD_BYTES after read) plus a 413 response would prevent that without affecting normal use.

Strengths

  • Docstrings carry their weight. Every module/function explains why it exists alongside what it does — LemonadeAudioError's docstring explicitly cites the no-silent-fallback policy and explains why falling through to whisper_asr would be wrong. Future maintainers won't have to guess.
  • No-silent-fallback policy applied consistently. All four error paths (/voice/transcribe, /voice/speech, /voice/health, plus the lemonade_audio primitives) translate upstream failures into structured 502/503/400 responses with the original cause preserved via raise ... from e. The _backend() resolver logs and warns on unknown values rather than silently picking a default that disagrees with the user's intent.
  • Genuinely additive change set. whisper_asr.py / kokoro_tts.py / existing gaia talk callers untouched; the registration in server.py is one import + one include_router call alongside the eight existing routers. The dual-backend rationale (macOS lacks Lemonade whispercpp/Kokoro recipes today) is concretely documented in the router module-docstring.
  • Wheel packaging matches existing precedent. Both setup.py package_data and MANIFEST.in recursive-include are updated together, mirroring the comment in setup.py:91-95 about belt-and-braces packaging — exactly the right pattern.

Verdict

Approve with suggestions. Nothing blocking; the architecture and the no-silent-fallback hygiene are good. The two 🟡 docs+tests gaps and the wheel-verifier gap are required by CLAUDE.md's "bulletproof commit" standard, so please address them before this is treated as merge-ready. The 🟢 items can be folded into the same revision or a follow-up at the author's discretion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audio Audio (ASR/TTS) changes dependencies Dependency updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants