agents.md: ban adding test skips without explicit user request (erigontech#21232)

mh0lt · web-flow · commit 23f2119585e0 · 2026-05-17T11:34:41.000Z
## Summary Adds a project-level policy in [agents.md](agents.md) for test skips. The rule applies to every contributor and to every automated agent (LLM coding assistants etc.) working in the repo. ## Three-part policy 1. **Why skips are dangerous** — concrete case: erigontech#21153 removed a `t.Skip` documenting a known parallel-exec SD/CREATE2 bug; the underlying bug was never actually fixed, so removing the skip suddenly red'd CI across downstream PRs. 2. **Two valid skip reasons (humans only, with tracking issue)**: - External test suites we import where we can't pass all the tests yet - Flakes with very low tolerance (general rule: reproduce locally and fix; only skip after serious investigation, with the investigation linked in the tracking issue) 3. **Strict ban for automated agents**: agents must never add a skip, period. Not as a tactical unblock, not behind a flag, not as an `AskUserQuestion` option. Default trajectory for a failing test is investigate → reproduce → fix → verify. If genuinely can't fix in-session, escalate with investigation findings — never silently mute. ## Why Skips convert a loud "this is broken" CI signal into silence, then back into surprise when the skip is later removed. The end state we want: humans add skips only for the two narrow reasons with tracking issues; automated agents always go through investigate → reproduce → fix, never → skip. ## Test plan - Docs-only change to `agents.md`; no code paths affected. - No CI gating to add — this is a behavioural directive for both human contributors reading the repo's contributor docs and for automated agents that read `agents.md`/`CLAUDE.md` at session start.
diff --git a/agents.md b/agents.md
@@ -42,6 +42,39 @@ Before committing, always verify changes with: `make lint && make erigon integra
 ./build/bin/erigon --datadir=dev --chain=dev --beacon.api=beacon,validator,node,config  # PoS dev mode
 ```
 
+## Test skips
+
+These rules apply project-wide — to every contributor and to every automated agent (LLM coding assistants, CI bots, etc.) working in this repository.
+
+### Why skips are dangerous
+
+A failing test is a real failure that must be diagnosed and fixed. Skipping it hides the failure and pushes the cost onto whoever later removes the skip — at which point the underlying bug is still there and now also surprises them. Skipping converts a loud "this is broken" signal into silence, then back into surprise. Concrete case: `#21153` removed a `t.Skip` for `TestGeneratedTraceApiCollision` that had documented a known parallel-exec SD/CREATE2-reincarnation bug; the underlying bug was never actually fixed (the comment said "fixed on `exec3/remove-rwtx-threading` branch" — that branch's fix never merged), so removing the skip suddenly red'd CI across downstream PRs (notably #21017).
+
+### Two valid reasons a skip may exist
+
+Both apply to human contributors. Both require an explicit, linked tracking issue. Neither permits an automated agent to add the skip on its own.
+
+1. **External test suites we import where we can't pass all the tests** — typically because we haven't done the corresponding development yet (e.g., an upstream Ethereum spec test for a feature we haven't implemented). The skip documents the gap rather than hiding a regression.
+
+2. **Flaky tests** — partially valid, with very low (not zero) tolerance. The general rule for flakes is **reproduce locally and fix**. Only after a serious attempt at local repro and root-cause analysis (not "I ran it three times and it passed") should a skip be considered. The tracking issue must include the local-repro investigation attached, and the test owner accepts responsibility for un-skipping once the flake is fixed.
+
+In both cases the skip carries an inline comment with the linked tracking issue, and the issue gets closed by removing the skip — not by closing the issue with the skip still in place.
+
+### Rule for automated agents (LLM assistants etc.)
+
+**Automated agents must never add a skip. Period.** Not even with a "the user can review it" framing. Not as an option in `AskUserQuestion` menus. Not as a "tactical unblock" suggestion in text answers. Not behind any conditional or env-var gate.
+
+When an agent encounters a failing test:
+- Investigate the failure: read logs, reproduce locally, narrow to a minimal repro
+- Fix the underlying bug
+- If the agent genuinely can't fix it in-session, the correct outcomes are: (a) escalate to the user with the investigation findings, or (b) report it as a tracked blocker — never (c) skip it
+
+If a flaky test is blocking the agent's own CI iteration, the agent reproduces locally and either fixes the flake or hands off to the user with the repro recipe. Adding a skip "just to get CI green" is exactly the pattern that produced #21153's surprise.
+
+Applies to all forms of test muting: `t.Skip`, `t.SkipNow`, `t.Skipf`, `SkipLoad`, `bt.SkipLoad`, build-tag exclusions, conditional bypasses behind `dbg.*` env flags, removing tests from a runner matrix without a tracking issue. **All off-limits for automated agents.**
+
+If a user explicitly directs an agent to add a skip in the current turn (overriding this rule for a specific case), the agent should still flag the trade-off and ensure a tracking issue exists.
+
 ## Conventions
 
 Commit messages: prefix with package(s) modified, e.g., `eth, rpc: make trace configs optional`