Skip to content

feat(email): synthetic .mbox dataset for email triage tests#928

Open
theonlychant wants to merge 17 commits intoamd:mainfrom
theonlychant:feat/email-mbox-fixtures
Open

feat(email): synthetic .mbox dataset for email triage tests#928
theonlychant wants to merge 17 commits intoamd:mainfrom
theonlychant:feat/email-mbox-fixtures

Conversation

@theonlychant
Copy link
Copy Markdown
Contributor

Closes #848

Summary

Adds a synthetic .mbox dataset for testing the email triage agent.
The fixtures provide realistic email threads for unit and integration
testing without requiring live mailbox access.

Why GAIA needs it

The email triage agent currently has no test data to run against,
making it impossible to validate triage logic in CI.

Test plan

  • tests/unit/test_agents_split.py - all tests passing

Copy link
Copy Markdown
Collaborator

@itomek-amd itomek-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theonlychant — thanks for the work, and the cleanup pass that pulled connections/ back out was the right call. Requesting changes for two reasons:

  1. Code doesn't parse. Two unresolved merge-conflict blocks landed in src/gaia/agents/chat/agent.py (see inline comments). python -m py_compile src/gaia/agents/chat/agent.py fails with SyntaxError: invalid decimal literal at line 1784, which is also why the Claude AI Assistant / pr-review check failed — the bot can't review a file that doesn't compile.
  2. Title and scope don't match. The PR is titled feat(email): synthetic .mbox dataset for email triage tests and the body says Closes #848, but the diff bundles ~1100 lines of tool-loader work for #688/#800 and a tests/unit/test_pkce.py that imports a path (src/gaia/connections/pkce.py) the same PR deletes.

Two ways to resolve

Option A (recommended) — refocus this PR on its stated scope. Revert the tool-loader changes (src/gaia/agents/base/tool_loader.py, the tool_loader wiring in base/agent.py, the new ~260 lines in chat/agent.py, and tests/unit/test_tool_loader.py). Delete tests/unit/test_pkce.py. What's left is a clean #848 PR. Open a separate PR for the tool-loader linked to #688/#800.

Option B — keep both, but fix the merge. Resolve the conflict markers in chat/agent.py, delete tests/unit/test_pkce.py, update the title and body to declare both threads (e.g. feat: email .mbox fixtures + tool-loader for #688), and link both issues. I'd push for A — bundled PRs are harder to land and harder to revert if one half regresses.

Before re-requesting review

  • python -m py_compile src/gaia/agents/chat/agent.py exits 0
  • pytest tests/unit/test_synthetic_mbox.py -q is green on a clean checkout
  • (If keeping any tool-loader work) pytest tests/unit/test_tool_loader.py -q is green

Happy to re-review once either option lands. The mbox fixture itself looks like a useful chunk of work.

Comment thread src/gaia/agents/chat/agent.py Outdated
Comment thread src/gaia/agents/chat/agent.py Outdated
Comment thread tests/unit/test_pkce.py Outdated
Comment thread src/gaia/agents/chat/agent.py Outdated
Comment thread src/gaia/agents/base/tool_loader.py
@theonlychant
Copy link
Copy Markdown
Contributor Author

theonlychant commented May 1, 2026

Alright so @kovtcharov or @kovtcharov-amd and @itomek or @itomek-amd , I've addressed both issues:

Removed all tool-loader and pkce files and PR is now scoped to #848 only
Fixed the set_charset payload error in generate_mbox.py and all tests now passing

Ready for re-review when you get a chance, thanks!

@github-actions github-actions Bot added the cli CLI changes label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents cli CLI changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(email): synthetic .mbox dataset for email triage agent testing

2 participants