What happened
On PR #2909 fix iteration 3 (#2909), the fix agent was asked to get CI passing and codecov coverage up. The agent added 4 tests and got queryMintHealth to 100% coverage, but reported it 'disagreed' with a CI lint failure because it could not reproduce it locally. The CI failure was a real go vet error caused by stale test function signatures that no longer matched the refactored production code (bundleFunctionSource changed from 1-arg to 3-arg). This dismissal caused a 3-day delay — the human had to come back on July 5 and trigger a 4th /fs-fix to resolve the issue. The fix was a 1-line change per call site.
Related open issues: #1884 ('Fix agent should check CI status and address failing checks') covers the broader category of CI awareness, but does not address the specific anti-pattern of the agent investigating a failure, failing to reproduce it, and then dismissing it as spurious.
What could go better
The fix agent's failure mode was not 'ignoring CI' (which #1884 covers) but rather 'investigating CI, failing to reproduce, and then concluding the failure is wrong.' The root cause of the local/CI divergence was likely that the agent's local environment didn't have the same merged state as CI, or the agent only ran a subset of tests. The correct behavior when a CI failure can't be reproduced locally is to trust CI over local, read the exact CI error output, and fix based on that output. The agent should never conclude a CI failure is wrong without evidence (e.g., demonstrating the same test fails on the main branch). Confidence: high — this is a clear anti-pattern with a concrete 3-day cost on this PR.
Proposed change
Add guidance to AGENTS.md (in the 'Go code' section, near the unit tests/coverage guidance) or to the fix agent's skill/definition:
CI failures take precedence over local results. If CI reports a failure you cannot reproduce locally, trust CI. Read the full CI log output and fix based on the error message — do not dismiss failures as spurious. Common causes of local/CI divergence: stale dependencies, unmerged changes from main, different Go toolchain versions, running only a subset of tests. Only report a CI failure as a pre-existing flake if you can demonstrate the same failure exists on the main branch.
If this is better addressed upstream in the fix agent definition (fullsend-ai/fullsend agent configs), add the guidance there instead. The key principle is: never dismiss a CI failure without positive evidence it's unrelated to the PR's changes.
Validation criteria
Over the next 10 fix agent iterations where CI fails, the agent should never dismiss a failure without providing evidence it's pre-existing. Track by searching fix agent PR comments for dismissal language ('disagree', 'cannot reproduce', 'spurious', 'flaky') and verifying each is accompanied by evidence (e.g., link to same failure on main). Success: zero unsubstantiated dismissals in the next 10 relevant fix iterations.
Generated by retro agent from #2909
What happened
On PR #2909 fix iteration 3 (#2909), the fix agent was asked to get CI passing and codecov coverage up. The agent added 4 tests and got queryMintHealth to 100% coverage, but reported it 'disagreed' with a CI lint failure because it could not reproduce it locally. The CI failure was a real go vet error caused by stale test function signatures that no longer matched the refactored production code (bundleFunctionSource changed from 1-arg to 3-arg). This dismissal caused a 3-day delay — the human had to come back on July 5 and trigger a 4th /fs-fix to resolve the issue. The fix was a 1-line change per call site.
Related open issues: #1884 ('Fix agent should check CI status and address failing checks') covers the broader category of CI awareness, but does not address the specific anti-pattern of the agent investigating a failure, failing to reproduce it, and then dismissing it as spurious.
What could go better
The fix agent's failure mode was not 'ignoring CI' (which #1884 covers) but rather 'investigating CI, failing to reproduce, and then concluding the failure is wrong.' The root cause of the local/CI divergence was likely that the agent's local environment didn't have the same merged state as CI, or the agent only ran a subset of tests. The correct behavior when a CI failure can't be reproduced locally is to trust CI over local, read the exact CI error output, and fix based on that output. The agent should never conclude a CI failure is wrong without evidence (e.g., demonstrating the same test fails on the main branch). Confidence: high — this is a clear anti-pattern with a concrete 3-day cost on this PR.
Proposed change
Add guidance to AGENTS.md (in the 'Go code' section, near the unit tests/coverage guidance) or to the fix agent's skill/definition:
If this is better addressed upstream in the fix agent definition (fullsend-ai/fullsend agent configs), add the guidance there instead. The key principle is: never dismiss a CI failure without positive evidence it's unrelated to the PR's changes.
Validation criteria
Over the next 10 fix agent iterations where CI fails, the agent should never dismiss a failure without providing evidence it's pre-existing. Track by searching fix agent PR comments for dismissal language ('disagree', 'cannot reproduce', 'spurious', 'flaky') and verifying each is accompanied by evidence (e.g., link to same failure on main). Success: zero unsubstantiated dismissals in the next 10 relevant fix iterations.
Generated by retro agent from #2909