feat(email): engine=llm triage API + remove body clipping (#1452, #1539) by itomek · Pull Request #1590 · amd/gaia

itomek · 2026-06-11T14:25:42Z

The engine=llm path added for #1452 was written against src/gaia/api/email_routes.py, which the hub migration (#1520) deleted — leaving the feature stranded on a conflicting branch. Body text was silently truncated to 4 000 characters before reaching the local model, cutting signal from long emails and threads (#1539 observation).

This PR ports both fixes to the hub's canonical location.

After: POST /v1/email/triage?engine=llm works end-to-end — low-confidence results are escalated to the local Lemonade model (Gemma-4-E4B) with the full email body and a 4 096-token output cap so chain-of-thought reasoning fits. Thread bodies are now joined newest-first. EmailTriageResult.message_id echoes the request ID back to the caller.

Test plan

uv run python -m pytest hub/agents/python/email/tests/ -x — 11 tests pass (including body-not-clipped and newest-first ordering assertions)
Live smoke: curl -s localhost:4200/v1/email/triage?engine=llm -d '...' returns 200 with result.category from LLM (not 502)
curl ... ?engine=openai returns 422

Closes #1452
Closes #1539

, #1539) Before: the email triage API was heuristic-only; the engine=llm path added in PR #1547 patched src/gaia/api/email_routes.py which was deleted when the email agent migrated to the hub (#1520). Body text was silently truncated to 4 000 characters before reaching the local model, hiding signal in long threads. After: the hub api_routes.py gains the engine=llm path (asyncio.to_thread + LLMTriageError/EmailSummarizeError → 502 translation); body clipping is removed from llm_triage.py and summarize_tools.py so the full text reaches the model; thread joining is reversed to newest-first so recent messages get priority. EmailTriageResult gains the message_id echo (#1551 contract change, ported here).

github-actions · 2026-06-11T14:58:46Z

Summary

Clean, well-tested port of the engine=llm triage path to the hub's canonical api_routes.py, and the body-clipping removal is correctly applied to both llm_triage.py and summarize_tools.py. The fail-loud discipline is exactly right — invalid engine raises ValueError, LLM failures surface as HTTP 502, and the cloud base_url guard is preserved — so this matches GAIA's no-silent-fallback rule end to end. The one thing to reconcile before merge: the endpoint docstring claims the default engine=heuristic is "byte-identical to the previous behaviour", but the thread path now joins newest-first, which changes the default heuristic output for threads.

Issues

🟡 Important

"byte-identical" claim is inaccurate for threads (api_routes.py:262, docstring near api_routes.py:688)

The newest-first reversal was applied to the heuristic thread path (_triage_thread), not just the LLM path. Since _summarize leads with the first 1–2 sentences of combined_body and _extract_action_items walks it in order, reversing the messages changes the default engine=heuristic thread summary and action-item ordering. That directly contradicts the endpoint docstring:

The default engine=heuristic is byte-identical to the previous behaviour

test_thread_newest_first even asserts the new ordering on engine="heuristic", so the change is intentional — but the claim then needs to go. Either soften the docstring (single-email default is unchanged; thread default now leads with the newest message) or scope the reversal to _triage_thread_llm only if you want the heuristic path to stay stable. Worth a one-line note in the PR description too, since "no behavioural change on the default path" is part of the sell.

Unbounded prompt growth has no context-window guard (llm_triage.py, summarize_tools.py)

Removing _BODY_CHAR_LIMIT / DEFAULT_BODY_LIMIT_CHARS is the intended #1539 fix, and "context window governs what fits" is a reasonable design. But there's now no upper bound and no loud failure if a long thread overflows Gemma-4-E4B's context — it depends entirely on the backend rejecting or truncating. Given the fail-loud philosophy elsewhere in this PR, consider either documenting the reliance on the backend's context-overflow error or adding a guard that raises a clear LLMTriageError when the assembled prompt exceeds the model's window. Not a blocker, but the old comment ("without unbounded prompt growth on long threads") was deleted without a replacement safeguard.

🟢 Minor

message_id not threaded into the LLM tools (api_routes.py:337, :230-ish)

_build_result_llm has message_id in scope but calls classify_email_llm(...) and summarize_email_llm(...) without it, so their error messages and log.debug lines will report message_id='' — losing correlation on exactly the failures you've built fail-loud errors for. Both functions accept message_id:

            llm_result = classify_email_llm(
                chat,
                subject=subject,
                sender=sender_raw,
                body=body,
                message_id=message_id or "",
            )

(and the same for the summarize_email_llm call).

Contract field description is slightly off (contract.py:250)

The description says the value echoes "SingleEmailInput.message or ThreadInput.thread_id" — for the single case it's the message's message_id, not the message object. Minor wording:

            "Echoes the provider message-id from the request "
            "(SingleEmailInput.message.message_id or ThreadInput.thread_id). "
            "Null when the result was produced from a raw Gmail-API message "
            "(no contract message_id available)."

Test gap: the 502 mapping is uncovered. The new tests cover the happy path, service-level ValueError, body-not-clipped, and newest-first, but nothing exercises the endpoint's except (LLMTriageError, EmailSummarizeError) → 502. A small test with a fake chat that raises would lock in the fail-loud contract at the API boundary.

Strengths

Fail-loud throughout — ValueError on unknown engine, 502 on LLM/summarize failure, ConfigurationError on a cloud base_url. No silent fallback to heuristic anywhere, exactly as CLAUDE.md requires.
asyncio.to_thread keeps the blocking LLM call off the event loop instead of stalling the async endpoint — easy to miss, done right.
Literal["heuristic", "llm"] query param gives automatic 422 rejection of unknown engines at the framework boundary, on top of the service-level guard.
Good, targeted test additions — the body-not-clipped assertions directly pin the feat(email): echo message_id in triage response + read-by-UUID (dedup / no re-triage) #1539 regression.

Verdict

Approve with suggestions. No security or correctness blockers. The only must-fix before merge is reconciling the "byte-identical" docstring (and PR description) with the now-changed default thread ordering — pick one of: soften the claim, or scope the reversal to the LLM path. The rest are minor and safe to apply in the same pass.

github-actions · 2026-06-11T15:09:40Z

🟡 api_routes.py:44 — class docstring now says "the LLM is always used for summaries" but triage_gmail_message (line 208) still routes through the heuristic-only _build_result, not _build_result_llm. The REST contract endpoint (triage_request) correctly uses the LLM path; the Gmail-polling path was missed.

Either the docstring overclaims and should say the LLM path applies to triage_request only, or triage_gmail_message also needs to accept a chat argument and call _build_result_llm. As written, a developer reading the class doc will assume Gmail messages also get LLM summaries — they don't.

…without Lemonade

…starts (amd#1620) **Why this matters** The documented Milestone-40 partner recipe — `pip install 'amd-gaia[api]' gaia-agent-email`, then `gaia api` — crashed at server startup with `ModuleNotFoundError: No module named 'keyring'`, before serving a single request. `gaia api` auto-mounts the email wheel's REST router, and that import chain reaches a module-level `import keyring` in `gaia.connectors.store` (also hit at request time via `connected_mailbox_providers()`) — but `keyring` was declared only in the `[ui]`/`[dev]` extras, never `[api]`. So the one install combo the partner path depends on couldn't even boot; 0.20.1 only masks it because its server doesn't mount the email router yet. After this change `[api]` carries `keyring` (same pin as `[ui]`/`[dev]`), so the documented install starts and serves local LLM triage with zero manual installs. Needs to land before the v0.21.0 tag. A packaging guard (`tests/unit/test_api_extras.py`, mirroring `test_ui_extras.py`) fails if `[api]` ever drops keyring again. **Test plan** - [x] Guard test is red on `main` (keyring absent from `[api]`), green with the fix. - [x] `pytest tests/unit/test_api_extras.py tests/unit/test_packaging.py tests/unit/test_ui_extras.py` → 13 passed; `python util/lint.py --all` → clean. - [x] Real-world, Linux + Radeon dGPU, local Lemonade (Gemma-4-E4B), fresh venvs via the documented `git+` recipe: - **Before (`main`):** `from gaia.api.openai_server import app` → `ModuleNotFoundError: No module named 'keyring'` at `store.py:38` (keyring not installed). - **After (this branch):** keyring 25.7.0 pulled by `[api]` → import clean → uvicorn boots → `POST /v1/email/triage` → **200** with LLM triage (category `actionable`, summary echoing the planted `$4,820 / INV-90871 / Thursday 3pm` facts, action item + draft reply). Out of scope: AC3 (partner guide stating the exact install line) is tracked by amd#1597, and depends on amd#1601 publishing the `gaia-agent-email` wheel. Closes amd#1617. Related: amd#1602, amd#1590, amd#1597, amd#915, amd#1179. --------- Co-authored-by: Tomasz Iniewicz <heaters-nays0p@icloud.com>

itomek mentioned this pull request Jun 11, 2026

feat(email): expose LLM triage/summary via /v1/email/triage #1547

Closed

4 tasks

itomek marked this pull request as ready for review June 11, 2026 14:55

itomek requested a review from kovtcharov-amd as a code owner June 11, 2026 14:55

refactor(email): remove engine param — triage always uses local LLM

09d04c3

kovtcharov previously approved these changes Jun 11, 2026

View reviewed changes

itomek self-assigned this Jun 11, 2026

itomek added this to the v0.21.0 - Email Agent & Platform Foundations milestone Jun 11, 2026

fix(email): inject fake chat in TestEmailTriageEndpoint so CI passes …

65cd806

…without Lemonade

itomek dismissed kovtcharov’s stale review via 65cd806 June 11, 2026 15:31

github-actions Bot added the tests Test changes label Jun 11, 2026

style(email): black format tests/test_api.py

c62c519

kovtcharov-amd approved these changes Jun 11, 2026

View reviewed changes

itomek added this pull request to the merge queue Jun 11, 2026

Merged via the queue into main with commit a55e23b Jun 11, 2026
47 checks passed

itomek deleted the feat/email-hub-llm-triage branch June 11, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(email): engine=llm triage API + remove body clipping (#1452, #1539)#1590

feat(email): engine=llm triage API + remove body clipping (#1452, #1539)#1590
itomek merged 4 commits into
mainfrom
feat/email-hub-llm-triage

itomek commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

itomek commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

github-actions Bot commented Jun 11, 2026

Summary

Issues

🟡 Important

🟢 Minor

Strengths

Verdict

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

itomek commented Jun 11, 2026 •

edited

Loading