Skip to content

feat(email): engine=llm triage API + remove body clipping (#1452, #1539)#1590

Merged
itomek merged 4 commits into
mainfrom
feat/email-hub-llm-triage
Jun 11, 2026
Merged

feat(email): engine=llm triage API + remove body clipping (#1452, #1539)#1590
itomek merged 4 commits into
mainfrom
feat/email-hub-llm-triage

Conversation

@itomek

@itomek itomek commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

The engine=llm path added for #1452 was written against src/gaia/api/email_routes.py, which the hub migration (#1520) deleted — leaving the feature stranded on a conflicting branch. Body text was silently truncated to 4 000 characters before reaching the local model, cutting signal from long emails and threads (#1539 observation).

This PR ports both fixes to the hub's canonical location.

After: POST /v1/email/triage?engine=llm works end-to-end — low-confidence results are escalated to the local Lemonade model (Gemma-4-E4B) with the full email body and a 4 096-token output cap so chain-of-thought reasoning fits. Thread bodies are now joined newest-first. EmailTriageResult.message_id echoes the request ID back to the caller.

Test plan

  • uv run python -m pytest hub/agents/python/email/tests/ -x — 11 tests pass (including body-not-clipped and newest-first ordering assertions)
  • Live smoke: curl -s localhost:4200/v1/email/triage?engine=llm -d '...' returns 200 with result.category from LLM (not 502)
  • curl ... ?engine=openai returns 422

Closes #1452
Closes #1539

, #1539)

Before: the email triage API was heuristic-only; the engine=llm path added
in PR #1547 patched src/gaia/api/email_routes.py which was deleted when the
email agent migrated to the hub (#1520). Body text was silently truncated to
4 000 characters before reaching the local model, hiding signal in long threads.

After: the hub api_routes.py gains the engine=llm path (asyncio.to_thread +
LLMTriageError/EmailSummarizeError → 502 translation); body clipping is removed
from llm_triage.py and summarize_tools.py so the full text reaches the model;
thread joining is reversed to newest-first so recent messages get priority.
EmailTriageResult gains the message_id echo (#1551 contract change, ported here).
@itomek itomek marked this pull request as ready for review June 11, 2026 14:55
@itomek itomek requested a review from kovtcharov-amd as a code owner June 11, 2026 14:55
@github-actions

Copy link
Copy Markdown
Contributor

Summary

Clean, well-tested port of the engine=llm triage path to the hub's canonical api_routes.py, and the body-clipping removal is correctly applied to both llm_triage.py and summarize_tools.py. The fail-loud discipline is exactly right — invalid engine raises ValueError, LLM failures surface as HTTP 502, and the cloud base_url guard is preserved — so this matches GAIA's no-silent-fallback rule end to end. The one thing to reconcile before merge: the endpoint docstring claims the default engine=heuristic is "byte-identical to the previous behaviour", but the thread path now joins newest-first, which changes the default heuristic output for threads.

Issues

🟡 Important

"byte-identical" claim is inaccurate for threads (api_routes.py:262, docstring near api_routes.py:688)

The newest-first reversal was applied to the heuristic thread path (_triage_thread), not just the LLM path. Since _summarize leads with the first 1–2 sentences of combined_body and _extract_action_items walks it in order, reversing the messages changes the default engine=heuristic thread summary and action-item ordering. That directly contradicts the endpoint docstring:

The default engine=heuristic is byte-identical to the previous behaviour

test_thread_newest_first even asserts the new ordering on engine="heuristic", so the change is intentional — but the claim then needs to go. Either soften the docstring (single-email default is unchanged; thread default now leads with the newest message) or scope the reversal to _triage_thread_llm only if you want the heuristic path to stay stable. Worth a one-line note in the PR description too, since "no behavioural change on the default path" is part of the sell.

Unbounded prompt growth has no context-window guard (llm_triage.py, summarize_tools.py)

Removing _BODY_CHAR_LIMIT / DEFAULT_BODY_LIMIT_CHARS is the intended #1539 fix, and "context window governs what fits" is a reasonable design. But there's now no upper bound and no loud failure if a long thread overflows Gemma-4-E4B's context — it depends entirely on the backend rejecting or truncating. Given the fail-loud philosophy elsewhere in this PR, consider either documenting the reliance on the backend's context-overflow error or adding a guard that raises a clear LLMTriageError when the assembled prompt exceeds the model's window. Not a blocker, but the old comment ("without unbounded prompt growth on long threads") was deleted without a replacement safeguard.

🟢 Minor

message_id not threaded into the LLM tools (api_routes.py:337, :230-ish)

_build_result_llm has message_id in scope but calls classify_email_llm(...) and summarize_email_llm(...) without it, so their error messages and log.debug lines will report message_id='' — losing correlation on exactly the failures you've built fail-loud errors for. Both functions accept message_id:

            llm_result = classify_email_llm(
                chat,
                subject=subject,
                sender=sender_raw,
                body=body,
                message_id=message_id or "",
            )

(and the same for the summarize_email_llm call).

Contract field description is slightly off (contract.py:250)

The description says the value echoes "SingleEmailInput.message or ThreadInput.thread_id" — for the single case it's the message's message_id, not the message object. Minor wording:

            "Echoes the provider message-id from the request "
            "(SingleEmailInput.message.message_id or ThreadInput.thread_id). "
            "Null when the result was produced from a raw Gmail-API message "
            "(no contract message_id available)."

Test gap: the 502 mapping is uncovered. The new tests cover the happy path, service-level ValueError, body-not-clipped, and newest-first, but nothing exercises the endpoint's except (LLMTriageError, EmailSummarizeError) → 502. A small test with a fake chat that raises would lock in the fail-loud contract at the API boundary.

Strengths

  • Fail-loud throughoutValueError on unknown engine, 502 on LLM/summarize failure, ConfigurationError on a cloud base_url. No silent fallback to heuristic anywhere, exactly as CLAUDE.md requires.
  • asyncio.to_thread keeps the blocking LLM call off the event loop instead of stalling the async endpoint — easy to miss, done right.
  • Literal["heuristic", "llm"] query param gives automatic 422 rejection of unknown engines at the framework boundary, on top of the service-level guard.
  • Good, targeted test additions — the body-not-clipped assertions directly pin the feat(email): echo message_id in triage response + read-by-UUID (dedup / no re-triage) #1539 regression.

Verdict

Approve with suggestions. No security or correctness blockers. The only must-fix before merge is reconciling the "byte-identical" docstring (and PR description) with the now-changed default thread ordering — pick one of: soften the claim, or scope the reversal to the LLM path. The rest are minor and safe to apply in the same pass.

@github-actions

Copy link
Copy Markdown
Contributor

🟡 api_routes.py:44 — class docstring now says "the LLM is always used for summaries" but triage_gmail_message (line 208) still routes through the heuristic-only _build_result, not _build_result_llm. The REST contract endpoint (triage_request) correctly uses the LLM path; the Gmail-polling path was missed.

Either the docstring overclaims and should say the LLM path applies to triage_request only, or triage_gmail_message also needs to accept a chat argument and call _build_result_llm. As written, a developer reading the class doc will assume Gmail messages also get LLM summaries — they don't.

kovtcharov
kovtcharov previously approved these changes Jun 11, 2026
@github-actions github-actions Bot added the tests Test changes label Jun 11, 2026
@itomek itomek added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit a55e23b Jun 11, 2026
47 checks passed
@itomek itomek deleted the feat/email-hub-llm-triage branch June 11, 2026 16:21
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request Jun 12, 2026
…starts (amd#1620)

**Why this matters**

The documented Milestone-40 partner recipe — `pip install
'amd-gaia[api]' gaia-agent-email`, then `gaia api` — crashed at server
startup with `ModuleNotFoundError: No module named 'keyring'`, before
serving a single request. `gaia api` auto-mounts the email wheel's REST
router, and that import chain reaches a module-level `import keyring` in
`gaia.connectors.store` (also hit at request time via
`connected_mailbox_providers()`) — but `keyring` was declared only in
the `[ui]`/`[dev]` extras, never `[api]`. So the one install combo the
partner path depends on couldn't even boot; 0.20.1 only masks it because
its server doesn't mount the email router yet.

After this change `[api]` carries `keyring` (same pin as
`[ui]`/`[dev]`), so the documented install starts and serves local LLM
triage with zero manual installs. Needs to land before the v0.21.0 tag.
A packaging guard (`tests/unit/test_api_extras.py`, mirroring
`test_ui_extras.py`) fails if `[api]` ever drops keyring again.

**Test plan**
- [x] Guard test is red on `main` (keyring absent from `[api]`), green
with the fix.
- [x] `pytest tests/unit/test_api_extras.py tests/unit/test_packaging.py
tests/unit/test_ui_extras.py` → 13 passed; `python util/lint.py --all` →
clean.
- [x] Real-world, Linux + Radeon dGPU, local Lemonade (Gemma-4-E4B),
fresh venvs via the documented `git+` recipe:
- **Before (`main`):** `from gaia.api.openai_server import app` →
`ModuleNotFoundError: No module named 'keyring'` at `store.py:38`
(keyring not installed).
- **After (this branch):** keyring 25.7.0 pulled by `[api]` → import
clean → uvicorn boots → `POST /v1/email/triage` → **200** with LLM
triage (category `actionable`, summary echoing the planted `$4,820 /
INV-90871 / Thursday 3pm` facts, action item + draft reply).

Out of scope: AC3 (partner guide stating the exact install line) is
tracked by amd#1597, and depends on amd#1601 publishing the `gaia-agent-email`
wheel.

Closes amd#1617.
Related: amd#1602, amd#1590, amd#1597, amd#915, amd#1179.

---------

Co-authored-by: Tomasz Iniewicz <heaters-nays0p@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes

Projects

None yet

3 participants