feat(email): synthetic .mbox dataset for email triage tests#928
feat(email): synthetic .mbox dataset for email triage tests#928theonlychant wants to merge 17 commits intoamd:mainfrom
Conversation
itomek-amd
left a comment
There was a problem hiding this comment.
@theonlychant — thanks for the work, and the cleanup pass that pulled connections/ back out was the right call. Requesting changes for two reasons:
- Code doesn't parse. Two unresolved merge-conflict blocks landed in
src/gaia/agents/chat/agent.py(see inline comments).python -m py_compile src/gaia/agents/chat/agent.pyfails withSyntaxError: invalid decimal literal at line 1784, which is also why theClaude AI Assistant / pr-reviewcheck failed — the bot can't review a file that doesn't compile. - Title and scope don't match. The PR is titled
feat(email): synthetic .mbox dataset for email triage testsand the body saysCloses #848, but the diff bundles ~1100 lines of tool-loader work for #688/#800 and atests/unit/test_pkce.pythat imports a path (src/gaia/connections/pkce.py) the same PR deletes.
Two ways to resolve
Option A (recommended) — refocus this PR on its stated scope. Revert the tool-loader changes (src/gaia/agents/base/tool_loader.py, the tool_loader wiring in base/agent.py, the new ~260 lines in chat/agent.py, and tests/unit/test_tool_loader.py). Delete tests/unit/test_pkce.py. What's left is a clean #848 PR. Open a separate PR for the tool-loader linked to #688/#800.
Option B — keep both, but fix the merge. Resolve the conflict markers in chat/agent.py, delete tests/unit/test_pkce.py, update the title and body to declare both threads (e.g. feat: email .mbox fixtures + tool-loader for #688), and link both issues. I'd push for A — bundled PRs are harder to land and harder to revert if one half regresses.
Before re-requesting review
python -m py_compile src/gaia/agents/chat/agent.pyexits 0pytest tests/unit/test_synthetic_mbox.py -qis green on a clean checkout- (If keeping any tool-loader work)
pytest tests/unit/test_tool_loader.py -qis green
Happy to re-review once either option lands. The mbox fixture itself looks like a useful chunk of work.
|
Alright so @kovtcharov or @kovtcharov-amd and @itomek or @itomek-amd , I've addressed both issues: Removed all tool-loader and pkce files and PR is now scoped to #848 only Ready for re-review when you get a chance, thanks! |
Closes #848
Summary
Adds a synthetic .mbox dataset for testing the email triage agent.
The fixtures provide realistic email threads for unit and integration
testing without requiring live mailbox access.
Why GAIA needs it
The email triage agent currently has no test data to run against,
making it impossible to validate triage logic in CI.
Test plan
tests/unit/test_agents_split.py- all tests passing