fix(agents): early dedup for repeated mutation tool calls by kovtcharov-amd · Pull Request #1557 · amd/gaia

kovtcharov-amd · 2026-06-09T16:58:30Z

The base agent caught repeated query tool calls at the first repeat but had no equivalent for mutation tools. Before: a small model that re-issued an identical mutation (e.g. mark_read on an already-read id) burned ~4 planning steps before the reactive loop-detector halted it, and could drop the last item in a mutation sequence. After: an identical mutation re-issue is caught at the first repeat and the agent gets a corrective re-plan signal, at parity with query dedup. The cache is keyed on (tool, normalized args) so mutations on different ids are never suppressed, and the signal is injected after execution — no call is silently dropped.

Closes #1317.

Test plan

pytest tests/unit/agents/test_mutation_dedup.py — 6 tests (first-repeat detection, distinct-ids not suppressed, arg-order normalization, batch variants, non-mutation no-op, unhashable-args fallback)
python util/lint.py --all
Required before merge — base agent loop is LLM-affecting: run gaia eval agent on the relevant category and --compare against the committed baseline; confirm no regression (acceptance criterion in fix(agents): early dedup for repeated mutation tool calls #1317; not optional for base-agent changes)

Closes #1317

github-actions · 2026-06-09T17:03:00Z

Review: early dedup for repeated mutation tool calls

Summary

Solid, well-scoped fix that brings mutation tools to parity with the existing query-dedup signal. The core insight — key mutations on (tool, normalized args) rather than the result hash, because re-issuing mark_read on an already-read id returns a different result — is correct and clearly documented. No silent drops: the mutation still executes and a re-plan nudge is injected, consistent with GAIA's fail-loudly philosophy. The code change is small and the test coverage is deterministic and thoughtful.

The single most important thing: this touches the base agent loop, which CLAUDE.md classifies as LLM-affecting. The PR's own test plan lists gaia eval agent --compare as required before merge and it's still unchecked. That eval is the merge gate here — the unit tests pass code paths but won't catch a behavioral regression in the loop.

Issues Found

🟡 Eval run is a hard merge gate, not optional (src/gaia/agents/base/agent.py)
Per CLAUDE.md, changes to the base agent loop require a gaia eval agent run compared against the committed baseline before merge — and you've correctly flagged this in the PR description as required. Just calling it out explicitly: don't merge on green unit tests alone. The injected [SYSTEM] re-plan message changes what the model sees mid-loop, which is exactly the kind of change unit tests can't validate.

🟡 _MUTATION_TOOLS omits two real mutation tools (agent.py:14)
delete_tools.py also registers permanent_delete and restore_message as @tools — both mutate external state, and permanent_delete is the most destructive of the set. They'll fall back to the slow reactive loop-detector that this PR exists to front-run. Either include them or note why they're excluded (e.g. permanent_delete erroring on a second call is "loud enough"):

    "remove_star_batch",
    "trash_message",
    "restore_message",
    "permanent_delete",
)

(Confirmed there is no trash_message_batch tool, so its absence is correct.)

🟢 Verify provider tolerance for the user-message-between-tool-results ordering (agent.py:3557, agent.py:3783)
The corrective message is appended to messages before the current call's tool-result message, so in the native fan-out path you can get assistant(tool_calls) → user(dedup) → tool(result). Mutation tools (batch email) fan out far more than query tools do, so this interleaving will be exercised harder than the query-dedup path that established the pattern. It matches the existing _QUERY_TOOLS placement so it's not a new bug — but it's worth confirming the native tool-calling path tolerates a user turn sandwiched between an assistant's tool_calls and their tool responses. The eval run should surface it if it's a problem.

🟢 Domain tool names hardcoded in the base class (agent.py:14)
_MUTATION_TOOLS couples the base agent to EmailTriageAgent's tool vocabulary, the same way _QUERY_TOOLS couples it to RAG names. It matches precedent so I won't block on it, but an overridable hook (e.g. _get_mutation_tool_names() that agents extend) would keep the base layer domain-agnostic and let other agents opt in. Fine to defer.

Strengths

The "why args, not result" rationale in both the module comment and the docstring is genuinely useful — it documents the non-obvious invariant a future reader would otherwise re-derive (or break).
Test suite is deterministic (no live Lemonade/LLM) and covers the right edges: first-repeat detection, distinct-ids-not-suppressed, arg-order normalization, batch variants, non-mutation no-op (asserting cache == {}), and the unhashable-args str() fallback.
Clean parity with the established query-dedup pattern — re-plan signal instead of a silent drop, cache scoped to the loop run, threshold mirroring >= 2.

Verdict

Approve with suggestions — the code is sound and the tests are strong. No code-level blocker, but treat the gaia eval agent comparison as a hard pre-merge gate (it's already in your test plan, just unchecked), and please decide on permanent_delete/restore_message before merging.

itomek

Additive advisory signal keyed on (tool, args); mutation still always executes. LGTM.

Generated by Claude Code

fix(agents): early dedup for repeated mutation tool calls

15e0b0e

Closes #1317

github-actions Bot added tests Test changes agents labels Jun 9, 2026

kovtcharov-amd added 2 commits June 10, 2026 13:56

Merge branch 'main' into autofix/issue-1317

16ecb63

Merge branch 'main' into autofix/issue-1317

1ed5295

itomek approved these changes Jun 11, 2026

View reviewed changes

Merge branch 'main' into autofix/issue-1317

0e126f6

itomek added this pull request to the merge queue Jun 11, 2026

Merged via the queue into main with commit b0c03c5 Jun 11, 2026
37 checks passed

itomek deleted the autofix/issue-1317 branch June 11, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): early dedup for repeated mutation tool calls#1557

fix(agents): early dedup for repeated mutation tool calls#1557
itomek merged 4 commits into
mainfrom
autofix/issue-1317

kovtcharov-amd commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

itomek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kovtcharov-amd commented Jun 9, 2026

Test plan

Uh oh!

github-actions Bot commented Jun 9, 2026

Review: early dedup for repeated mutation tool calls

Summary

Issues Found

Strengths

Verdict

Uh oh!

itomek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants