feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector by kovtcharov · Pull Request #943 · amd/gaia

kovtcharov · 2026-05-02T23:09:15Z

Summary

Add a Lemonade-routed audio path to GAIA's UI server, alongside the existing in-process Whisper/Kokoro modules. Both backends ship together — selectable per-request via the GAIA_VOICE_BACKEND env var — until Lemonade adds macOS support for its whispercpp and Kokoro recipes.

This unblocks the deployment story (Lemonade serves LLM + STT + TTS on Ryzen AI / Linux + Windows) without breaking local development on macOS, where Lemonade's audio recipes don't run yet.

Refs #215 (switch ASR to Lemonade audio API) and #373 (server-side TTS via Lemonade) — partial fix; closes neither because the in-process path is preserved on purpose.

Two stacked commits

5d22a96 feat(audio): add Lemonade-routed STT/TTS endpoints + browser test page
e40773c feat(audio): GAIA_VOICE_BACKEND env var to pick lemonade vs in-process

What's added

Backend wrappers — additive, no replacements

src/gaia/audio/lemonade_audio.py (new) — HTTP client primitives: transcribe(), transcribe_bytes(), synthesize(), synthesize_bytes(), lemonade_health(). Targets /v1/audio/transcriptions, /v1/audio/speech, and /api/v1/health (matching the path the existing gaia.llm.lemonade_client.LemonadeClient uses). Per the no-silent-fallback policy, all failures raise LemonadeAudioError with an actionable message.
gaia.audio.whisper_asr / gaia.audio.kokoro_tts (unchanged) — the in-process Whisper and Kokoro classes are deliberately preserved. Existing callers (gaia talk via gaia.cli, gaia.audio.audio_client, gaia.agents.chat.agent) keep working with no code changes.

HTTP router with dual-backend dispatch

src/gaia/ui/routers/audio.py (new) — FastAPI router exposing four routes under /voice/*:

Method	Path	Purpose
POST	`/voice/transcribe`	multipart upload → STT → `{"text", "model", "backend"}`
POST	`/voice/speech`	Pydantic-validated JSON → TTS → audio bytes
GET	`/voice/health`	reports active backend + its readiness
GET	`/voice/test`	self-contained HTML harness

Each request dispatches based on GAIA_VOICE_BACKEND (default lemonade):

lemonade — POSTs to Lemonade's /v1/audio/* endpoints
in-process — falls through to WhisperAsr.transcribe_file() / KokoroTTS.generate_speech()

Lemonade STT model names (Whisper-Small) are mapped to openai-whisper-package names (small) automatically when in-process is selected. No silent fallback between backends — if the selected backend's deps or service are unavailable, the route returns 503 (missing deps) or 502 (upstream failure) with the original error message intact.

Browser test harness

src/gaia/ui/static/voice_test.html (new) — single-file mic + TTS test page. Browser-side MediaRecorder + OfflineAudioContext downsamples to 16 kHz mono WAV before upload (Lemonade requirement). Reports the active backend in its header, sourced from /voice/health. No JS bundle, no React rebuild.

Wheel packaging

setup.py — register gaia.ui/static/*.html in package_data so wheels ship voice_test.html.
MANIFEST.in — backstop recursive-include (matches the existing gaia.apps.webui pattern).

UI server registration

src/gaia/ui/server.py — +1 import, +1 app.include_router(audio_router_mod.router). Sits alongside the other 8 routers; no behavior change for existing routes.

Why both backends ship together

Lemonade v10.2's model registry only includes llamacpp recipes on macOS — whispercpp and Kokoro are Linux/Windows-only. A single-backend approach forces a choice between:

Replace with Lemonade only → breaks local Mac dev (the test page can't transcribe, gaia talk can't run)
Keep in-process only → blocks the deployment story (LLM and audio served by separate model loaders, double the install footprint)

The dual-backend approach removes the trade-off: developers on macOS export GAIA_VOICE_BACKEND=in-process and get the legacy path; production on Ryzen AI / Linux uses the default lemonade and gets the unified server. When Lemonade adds macOS support for the audio recipes, the in-process modules can be deprecated in a follow-up PR.

Diff stat

8 files changed, 909 insertions(+), 17 deletions(-)

No deletions of behavior. No public-API changes to WhisperAsr or KokoroTTS.

Test plan

First-time-use note

The first request to either Lemonade audio endpoint triggers a one-time model download (~30 s for Whisper-Small, similar for kokoro-v1). The wrapper's default timeout is 60 s so this doesn't race; pre-warming via curl before any timed test is recommended.

Follow-ups (not in this PR)

True real-time streaming via Lemonade's /realtime WebSocket endpoint (currently WhisperAsr.start_recording_streaming chunks the recording in-process — same as before this PR).
Once Lemonade ships macOS-compatible audio recipes (llamacpp-based Whisper / Kokoro, or a Metal-native variant), deprecate the in-process modules and [talk] extras' heavy openai-whisper / kokoro deps in a follow-up.
Wheel-content CI check: extend util/verify_wheel_dist.py to assert gaia/ui/static/voice_test.html is in the dist.

Add gaia.audio.lemonade_audio — an HTTP client for Lemonade Server's OpenAI-compatible /v1/audio/transcriptions and /v1/audio/speech endpoints. A single Lemonade instance can now serve LLM + STT (Whisper) + TTS (Kokoro) instead of GAIA loading separate Whisper / Kokoro models in process — one model cache, one health story. Wire it into the Agent UI through a new /voice router: POST /voice/transcribe multipart upload → Lemonade → {"text": "..."} POST /voice/speech JSON body → Lemonade → audio bytes GET /voice/health proxy to Lemonade /api/v1/health GET /voice/test self-contained browser harness (MediaRecorder + OfflineAudioContext WAV converter, no React rebuild needed) Body for /voice/speech is validated by a Pydantic SpeechRequest model; upstream Lemonade failures surface as 502 with the original error detail. The existing gaia.audio.whisper_asr / gaia.audio.kokoro_tts modules remain unchanged — the in-process variants still serve `gaia talk` and any other caller that does not have a Lemonade server running. Refs #215, #373.

+<script>
+  const $ = (id) => document.getElementById(id);
+  const setStatus = (id, kind, text) => {
+    $(id).innerHTML = `<span class="dot ${kind}"></span>${text}`;


github-actions · 2026-05-02T23:12:59Z

Summary

Cleanly factored, well-documented PR that adds an HTTP wrapper around Lemonade's OpenAI-compatible /v1/audio/* endpoints plus a /voice/* FastAPI router and a single-file browser harness. Scope is tightly contained, the no-silent-fallback policy is followed faithfully, and the wheel packaging is wired up. The single most important thing to fix before merge is the LEMONADE_BASE_URL convention mismatch — the rest of GAIA uses a base URL that includes /api/v1, this module assumes it doesn't, and a user with the documented env var set will get broken URLs like /api/v1/v1/audio/transcriptions. There are also a couple of secondary concerns (sync HTTP inside async endpoints, httpx not in base deps, missing tests/docs) flagged below.

Issues

🟡 Important

1. LEMONADE_BASE_URL convention mismatch breaks audio when the env var is set (src/gaia/audio/lemonade_audio.py:38)

Every other consumer in GAIA expects LEMONADE_BASE_URL to include /api/v1:

src/gaia/llm/lemonade_client.py:54-56 — DEFAULT_LEMONADE_URL = f"http://{DEFAULT_HOST}:{DEFAULT_PORT}/api/{LEMONADE_API_VERSION}"
src/gaia/agents/base/agent.py:199, src/gaia/agents/chat/agent.py:156, src/gaia/agents/code/cli.py:110, src/gaia/agents/builder/agent.py:133, src/gaia/agents/registry.py:600, src/gaia/agents/routing/agent.py:59, src/gaia/ui/routers/system.py:52, src/gaia/ui/_chat_helpers.py:1846 — all default to http://localhost:13305/api/v1.

This module instead defaults to http://localhost:13305 (no suffix) and constructs f"{base_url}/v1/audio/..." and f"{base_url}/api/v1/health". If a user has the documented env var set (the GAIA-wide convention), every URL produced here is broken:

transcribe → http://localhost:13305/api/v1/v1/audio/transcriptions
synthesize_bytes → http://localhost:13305/api/v1/v1/audio/speech
lemonade_health → http://localhost:13305/api/v1/api/v1/health

Strip the suffix (or origin-only the URL) before composing paths. Cleanest is to reuse gaia.llm.lemonade_client._get_lemonade_config so there's one source of truth. Inline option:

from urllib.parse import urlparse


def _lemonade_origin() -> str:
    """Return the scheme+host[:port] of LEMONADE_BASE_URL (strips any /api/v1 suffix).

    The rest of GAIA sets ``LEMONADE_BASE_URL=http://host:port/api/v1`` (the
    Lemonade native namespace). The OpenAI-compatible audio endpoints live at
    ``/v1/audio/*`` — *not* under ``/api/v1`` — so we resolve to the origin and
    let the caller append the right path.
    """
    raw = os.getenv("LEMONADE_BASE_URL", "http://localhost:13305")
    p = urlparse(raw)
    if not p.scheme or not p.netloc:
        return "http://localhost:13305"
    return f"{p.scheme}://{p.netloc}"


LEMONADE_URL = _lemonade_origin()

2. Sync httpx inside async def endpoints blocks the event loop (src/gaia/ui/routers/audio.py:39,60,84 via lemonade_audio.py:60,98,160,225)

The router endpoints are async def, but the wrapper calls synchronous httpx.post/httpx.get. STT round-trips take seconds (and the first call can be ~30s when Lemonade auto-downloads a model), and the entire FastAPI worker stalls for that duration — every other route on the UI server backs up behind one transcription. The existing pattern in this codebase is httpx.AsyncClient:

src/gaia/ui/routers/system.py:67,244,411,720 — all use async with httpx.AsyncClient(...).

Two options: (a) make lemonade_audio async-first and have the router await it, or (b) keep the sync API for general callers and await asyncio.to_thread(transcribe_bytes, ...) from inside the router. Option (a) matches the rest of the UI; option (b) preserves a sync callable for non-async consumers (the "Beacon downstream consumer" use case mentioned in the module docstring).

3. httpx is not in base install_requires (setup.py:112-123, setup.py:137)

httpx is declared only in the [ui] extra, but gaia.audio.lemonade_audio lives in the base package and imports it unconditionally. A user who installs gaia without [ui] will get ModuleNotFoundError: No module named 'httpx' on first import — including any "downstream consumer like Beacon" the module docstring is written for. Either:

Add httpx>=0.27.0 to install_requires, or
Replace httpx with requests (already in install_requires) — less consistent with the rest of gaia.ui but keeps base deps small.

4. No tests for the new module or router

CLAUDE.md ("Testing Requirements"): every new feature requires tests. Existing pattern under tests/unit/test_audio_*.py shows where these belong. Suggested minimum:

tests/unit/test_lemonade_audio.py — mock httpx and assert URL construction (catches issue Update installer and workflows/actions for CI/CD #1), the {"text": null} → "" normalization (lemonade_audio.py:99-100, lemonade_audio.py:144-145), and that all error paths raise LemonadeAudioError with from e chaining preserved.
tests/unit/test_audio_router.py — fastapi.testclient.TestClient, assert 400 on empty upload, 502 on LemonadeAudioError, correct media-type per response_format, and that /voice/test returns 200 with text/html.

5. Documentation not updated

CLAUDE.md mandates docs for every new feature, and docs/sdk/sdks/audio.mdx already exists but isn't touched here. The /voice/* HTTP routes are also a public surface that would benefit from an entry in docs/reference/ (or a section in the audio SDK page) so contributors can discover the test harness URL without reading the router source. The PR's own test plan still has [ ] mypy / ruff pass on the new modules unchecked — please run python util/lint.py --all before merge.

🟢 Minor

6. transcribe() and transcribe_bytes() are near-duplicates (lemonade_audio.py:57-101 vs 103-156)

The path version can delegate to the bytes version and lose ~40 lines of duplication (compare to how synthesize already delegates to synthesize_bytes at lemonade_audio.py:194-204):

def transcribe(
    audio_path: str | Path,
    *,
    model: str = DEFAULT_STT_MODEL,
    language: str | None = "en",
    base_url: str = LEMONADE_URL,
    timeout: float = 60.0,
) -> str:
    """POST a WAV file to Lemonade /v1/audio/transcriptions.

    See :func:`transcribe_bytes` for argument and error semantics.
    """
    audio_path = Path(audio_path)
    if not audio_path.exists():
        raise FileNotFoundError(audio_path)
    return transcribe_bytes(
        audio_path.read_bytes(),
        filename=audio_path.name,
        model=model,
        language=language,
        base_url=base_url,
        timeout=timeout,
    )

7. LEMONADE_URL resolved at import time (lemonade_audio.py:38)

If a test fixture or long-running process mutates LEMONADE_BASE_URL after import, the change isn't picked up. The function-based pattern in gaia.llm.lemonade_client._get_lemonade_config avoids this. Goes hand-in-hand with the suggestion in issue #1.

8. voice/test reads HTML from disk on every request (src/gaia/ui/routers/audio.py:101)

Cache once at module load, or mount via StaticFiles (the pattern used at src/gaia/ui/server.py:407,423):

_TEST_HTML_PATH = Path(__file__).parent.parent / "static" / "voice_test.html"
try:
    _TEST_HTML = _TEST_HTML_PATH.read_text(encoding="utf-8")
except FileNotFoundError:
    _TEST_HTML = None  # surfaced as 500 in the handler below


@router.get("/test", response_class=HTMLResponse)
def voice_test_page():
    """Serve a single-page browser harness for STT + TTS smoke testing.

    Open in a browser at http://localhost:<ui-port>/voice/test
    """
    if _TEST_HTML is None:
        raise HTTPException(
            status_code=500,
            detail=(
                f"voice_test.html missing at {_TEST_HTML_PATH}. "
                "Reinstall gaia or restore src/gaia/ui/static/voice_test.html."
            ),
        )
    return HTMLResponse(_TEST_HTML)

9. Upstream status code is squashed in the router (src/gaia/ui/routers/audio.py:39-49)

Every Lemonade failure becomes a 502 regardless of root cause. A model_not_supported 404 (the macOS case the PR description calls out), an upload-format 400, and a connection timeout all look the same to the client. Not blocking — LemonadeAudioError's message text is preserved in detail — but if you have time, capture the upstream status_code on the exception and pass it through.

Strengths

No-silent-fallback discipline is exemplary. LemonadeAudioError messages at lemonade_audio.py:79-83, 92-96, 139-143 cleanly satisfy CLAUDE.md's "what failed / what to do / where to look" rule, and the docstring on the exception explicitly documents why there's no fallback to in-process whisper_asr / kokoro_tts. That rationale is exactly the right thing to write down.
PR description is the right shape: leads with why, endpoint table, an explicit "What's not touched" scope-fence, a hardware-tied test plan with honest unchecked boxes, and a first-time-use note about the model auto-download. Easy to review.
Browser harness is genuinely useful — single-file MediaRecorder + OfflineAudioContext WAV converter (voice_test.html:226-261) means contributors can validate the audio path without curl --form gymnastics, and there's no React rebuild in the loop. Wheel packaging via MANIFEST.in + setup.py package_data["gaia.ui"] mirrors the existing webui pattern correctly.

Verdict

Request changes — issues 1–3 are blocking (broken URLs under the documented env var, event-loop stall, missing base dep). Issues 4 & 5 (tests + docs) are CLAUDE.md requirements that should land in this PR rather than as a follow-up. Once those are addressed, the rest are quick polish — this is a well-scoped, well-documented change and I'd expect a fast turn-around.

The /voice/* HTTP routes added in the previous commit hard-coded the Lemonade backend. That works on Linux/Windows AMD Ryzen AI but breaks local development on macOS, where Lemonade's whispercpp / Kokoro recipes are not yet supported (per Lemonade v10.2 model registry, only llamacpp recipes run on macOS). Add a GAIA_VOICE_BACKEND env var so the routes work on both platforms until macOS support lands in Lemonade itself: GAIA_VOICE_BACKEND=lemonade (default — POSTs to /v1/audio/* on Lemonade) GAIA_VOICE_BACKEND=in-process (falls through to gaia.audio.whisper_asr + gaia.audio.kokoro_tts; works on macOS) Implementation: - /voice/transcribe dispatches to either transcribe_bytes() or a temp-WAV → WhisperAsr.transcribe_file() → cleanup - /voice/speech dispatches to either synthesize_bytes() or KokoroTTS().generate_speech() → soundfile-encoded WAV (no MP3 encoder bundled with the in-process path) - /voice/health reports the active backend and its readiness - Lemonade STT model names (Whisper-Small) are mapped to openai-whisper-package names (small) when in-process is selected - No silent fallback between backends — if the selected backend's deps or service are unavailable, the route returns a clear error (503 for missing deps, 502 for upstream failures) Voice test page (/voice/test) now displays the active backend in its header, sourced from /voice/health. In-process and Lemonade-routed audio coexist until Lemonade adds macOS support for its audio recipes.

+        return {
+            "backend": _IN_PROCESS,
+            "ready": deps_ok,
+            "detail": detail,
+            "stt_default": "small",
+            "tts_default": "af_bella",
+        }


github-actions · 2026-05-03T01:17:01Z

Summary

Solid additive PR that introduces a Lemonade-routed audio path alongside the existing in-process Whisper/Kokoro modules, gated by GAIA_VOICE_BACKEND. The diff is genuinely additive (zero deletions, no public-API changes), the no-silent-fallback policy is honored throughout with actionable error messages, and the dual-backend rationale is clearly documented. Main gaps: no tests for the new endpoints and no doc updates (docs/sdk/sdks/audio.mdx, docs/guides/talk.mdx) covering /voice/* or GAIA_VOICE_BACKEND — both required by CLAUDE.md for new features.

Issues Found

🟡 Important — No tests for the new audio path (`tests/`)

CLAUDE.md requires unit/integration tests for new features. None of lemonade_audio.py, routers/audio.py, or the /voice/* route shapes are covered. At minimum:

Unit test for lemonade_audio.transcribe_bytes/synthesize_bytes happy path + LemonadeAudioError translation, mocking httpx.post (similar pattern to tests/test_lemonade_client.py).
Router test that 502 is returned when LemonadeAudioError is raised, 400 on empty body, 503 on missing in-process deps, and that /voice/health reports the active backend.
from gaia.ui.routers.audio import router smoke check + route count assertion (mirrors the manual check from the PR description's test plan).

Without these the route shapes can silently drift; the PR description's test plan items are unchecked because they require Linux/Windows + macOS hardware, which is exactly why automated coverage matters.

🟡 Important — Docs not updated for new endpoints / env var

CLAUDE.md "Documentation Requirements" mandates docs for every new feature. The PR adds three public HTTP routes and a new env var (GAIA_VOICE_BACKEND) but neither docs/sdk/sdks/audio.mdx nor docs/guides/talk.mdx mention any of them. Reviewers/users have no entry point to discover /voice/test or how to flip the backend on macOS.

Suggest a short section in docs/sdk/sdks/audio.mdx listing the four routes + the env-var contract, plus a one-paragraph "On macOS, set GAIA_VOICE_BACKEND=in-process" callout in docs/guides/talk.mdx.

🟡 Important — Wheel verifier not extended for `voice_test.html`

The PR's own test plan flags this: util/verify_wheel_dist.py currently only asserts gaia/apps/webui/dist/. The new package_data entry for gaia.ui/static/*.html is a single point of failure (recursive globs in package_data are notoriously fragile across setuptools versions, which is exactly why the existing webui block has the verifier backstop). Without an assertion, a future setuptools change silently strips voice_test.html from the wheel and /voice/test 500s in production.

Add voice_test.html to the verifier in this PR rather than leaving it as a follow-up — the regression risk is the reason the verifier exists.

🟢 Minor — `transcribe` duplicates `transcribe_bytes` (`src/gaia/audio/lemonade_audio.py:94-155, 158-199`)

The two functions share ~30 lines of identical HTTP code. transcribe should delegate to transcribe_bytes:

def transcribe(
    audio_path: str | Path,
    *,
    model: str = DEFAULT_STT_MODEL,
    language: str | None = "en",
    base_url: str = LEMONADE_URL,
    timeout: float = 60.0,
) -> str:
    """POST a WAV file to Lemonade /v1/audio/transcriptions.

    Args:
        audio_path: path to a 16kHz mono WAV file (push-to-talk recordings).
        model: ``Whisper-Tiny`` | ``Whisper-Base`` | ``Whisper-Small`` |
               ``Whisper-Large`` (or any other Whisper variant Lemonade serves).
        language: ISO 639-1 code; defaults to ``"en"``. Pass ``None`` to
                  auto-detect.
        base_url: Lemonade server URL.
        timeout: HTTP timeout in seconds.

    Returns:
        The transcribed text.

    Raises:
        FileNotFoundError: if ``audio_path`` does not exist.
        LemonadeAudioError: server unreachable, non-200 status, or malformed
                            response.
    """
    audio_path = Path(audio_path)
    if not audio_path.exists():
        raise FileNotFoundError(audio_path)
    return transcribe_bytes(
        audio_path.read_bytes(),
        filename=audio_path.name,
        model=model,
        language=language,
        base_url=base_url,
        timeout=timeout,
    )

This already matches the pattern used between synthesize and synthesize_bytes further down the same file — the duplication looks like an oversight.

🟢 Minor — Inconsistent error translation in in-process STT path (`src/gaia/ui/routers/audio.py:435-461`)

The Lemonade path translates LemonadeAudioError → 502. The in-process path only catches FileNotFoundError (400) and ImportError (503); anything else from WhisperAsr.transcribe_file (audio decode error, runtime model failure) propagates as a bare 500 with no actionable detail. Consider mirroring the TTS path's except Exception → 502 (already used at audio.py:523) for parity:

        try:
            asr = WhisperAsr(model_size=package_name)
            text = asr.transcribe_file(tmp_path)
        except FileNotFoundError as e:
            raise HTTPException(status_code=400, detail=str(e)) from e
        except ImportError as e:
            raise HTTPException(status_code=503, detail=str(e)) from e
        except Exception as e:  # noqa: BLE001 — Whisper raises various errors
            raise HTTPException(status_code=502, detail=f"in-process STT failed: {e}") from e
        finally:

🟢 Minor — `_to_whisper_package_name` fallback silently lowercases unknown names (`src/gaia/ui/routers/audio.py:409-411`)

return _WHISPER_MODEL_TO_PACKAGE.get(name, name.lower()) means an unknown model name (e.g. a typo like Whisper-Smal) gets passed straight to openai-whisper as whisper-smal, producing a deep-stack error rather than the kind of actionable message the rest of this module favors. Consider raising HTTPException(400, ...) for unknown names, or at least logging a warning so the caller learns the value didn't map.

🟢 Minor — No upload-size guard on `/voice/transcribe`

audio.read() slurps the entire upload into memory before any backend dispatch. The UI server is local-only by default, but the tunnel router can expose it; a multi-GB upload would happily OOM the process. A quick Content-Length check (or len(audio_bytes) > MAX_UPLOAD_BYTES after read) plus a 413 response would prevent that without affecting normal use.

Strengths

Docstrings carry their weight. Every module/function explains why it exists alongside what it does — LemonadeAudioError's docstring explicitly cites the no-silent-fallback policy and explains why falling through to whisper_asr would be wrong. Future maintainers won't have to guess.
No-silent-fallback policy applied consistently. All four error paths (/voice/transcribe, /voice/speech, /voice/health, plus the lemonade_audio primitives) translate upstream failures into structured 502/503/400 responses with the original cause preserved via raise ... from e. The _backend() resolver logs and warns on unknown values rather than silently picking a default that disagrees with the user's intent.
Genuinely additive change set. whisper_asr.py / kokoro_tts.py / existing gaia talk callers untouched; the registration in server.py is one import + one include_router call alongside the eight existing routers. The dual-backend rationale (macOS lacks Lemonade whispercpp/Kokoro recipes today) is concretely documented in the router module-docstring.
Wheel packaging matches existing precedent. Both setup.py package_data and MANIFEST.in recursive-include are updated together, mirroring the comment in setup.py:91-95 about belt-and-braces packaging — exactly the right pattern.

Verdict

Approve with suggestions. Nothing blocking; the architecture and the no-silent-fallback hygiene are good. The two 🟡 docs+tests gaps and the wheel-verifier gap are required by CLAUDE.md's "bulletproof commit" standard, so please address them before this is treated as merge-ready. The 🟢 items can be folded into the same revision or a follow-up at the author's discretion.

kovtcharov requested a review from kovtcharov-amd as a code owner May 2, 2026 23:09

github-actions Bot added dependencies Dependency updates audio Audio (ASR/TTS) changes labels May 2, 2026

github-advanced-security AI found potential problems May 2, 2026

View reviewed changes

Comment thread src/gaia/ui/static/voice_test.html

<script>

const $ = (id) => document.getElementById(id);

const setStatus = (id, kind, text) => {

$(id).innerHTML = `<span class="dot ${kind}"></span>${text}`;

kovtcharov changed the title ~~feat(audio): Lemonade-routed STT/TTS endpoints + browser test page~~ feat(audio): unify STT/TTS on Lemonade Server (replace in-process Whisper/Kokoro) May 3, 2026

kovtcharov force-pushed the feat/lemonade-audio-endpoints branch from d1aef35 to 5d22a96 Compare May 3, 2026 01:11

kovtcharov changed the title ~~feat(audio): unify STT/TTS on Lemonade Server (replace in-process Whisper/Kokoro)~~ feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector May 3, 2026

github-advanced-security AI found potential problems May 3, 2026

View reviewed changes

Comment thread src/gaia/ui/routers/audio.py

Comment on lines +259 to +265

return {

"backend": _IN_PROCESS,

"ready": deps_ok,

"detail": detail,

"stt_default": "small",

"tts_default": "af_bella",

}

itomek approved these changes May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector#943

feat(audio): Lemonade-routed STT/TTS endpoints + dual-backend selector#943
kovtcharov wants to merge 2 commits intomainfrom
feat/lemonade-audio-endpoints

kovtcharov commented May 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kovtcharov commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Two stacked commits

What's added

Backend wrappers — additive, no replacements

HTTP router with dual-backend dispatch

Browser test harness

Wheel packaging

UI server registration

Why both backends ship together

Diff stat

Test plan

First-time-use note

Follow-ups (not in this PR)

Uh oh!

github-actions Bot commented May 2, 2026

Summary

Issues

🟡 Important

🟢 Minor

Strengths

Verdict

Uh oh!

github-actions Bot commented May 3, 2026

Summary

Issues Found

🟡 Important — No tests for the new audio path (tests/)

🟡 Important — Docs not updated for new endpoints / env var

🟡 Important — Wheel verifier not extended for voice_test.html

🟢 Minor — transcribe duplicates transcribe_bytes (src/gaia/audio/lemonade_audio.py:94-155, 158-199)

🟢 Minor — Inconsistent error translation in in-process STT path (src/gaia/ui/routers/audio.py:435-461)

🟢 Minor — _to_whisper_package_name fallback silently lowercases unknown names (src/gaia/ui/routers/audio.py:409-411)

🟢 Minor — No upload-size guard on /voice/transcribe

Strengths

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kovtcharov commented May 2, 2026 •

edited

Loading

🟡 Important — No tests for the new audio path (`tests/`)

🟡 Important — Wheel verifier not extended for `voice_test.html`

🟢 Minor — `transcribe` duplicates `transcribe_bytes` (`src/gaia/audio/lemonade_audio.py:94-155, 158-199`)

🟢 Minor — Inconsistent error translation in in-process STT path (`src/gaia/ui/routers/audio.py:435-461`)

🟢 Minor — `_to_whisper_package_name` fallback silently lowercases unknown names (`src/gaia/ui/routers/audio.py:409-411`)

🟢 Minor — No upload-size guard on `/voice/transcribe`