execution: explicitly set parallel/sequential blocktests/enginextests shards in ci (#21223)

taratorio · web-flow · commit adb4e9686dd9 · 2026-05-18T02:17:37.000Z
# execution: explicitly set parallel/sequential blocktests/enginextests
shards in ci

## Summary

Makes the `eest-spec-*` blocktest / enginextest CI shards declare their
`ERIGON_EXEC3_PARALLEL` mode explicitly via shard name (`-sequential` /
`-parallel` suffixes) and pinned env var, instead of relying on the
runtime default in `dbg.Exec3Parallel`. The default can now flip without
forcing a shard rename or budget rebase, and every shard's behaviour is
reproducible from the manifest alone.

### Manifest naming (`tools/eest-spec-shards.json`)

Renamed every implicit-sequential shard to an explicit `-sequential`
variant; their `-parallel` siblings already existed (or were added where
missing):

| Old name | New name |
|---|---|
| `blocktests-stable` | `blocktests-stable-sequential` |
| `enginextests-stable` | `enginextests-stable-sequential` (+ new
`enginextests-stable-parallel`) |
| `enginextests-benchmark-{1m,5m,10m,30m,60m,100m,150m}` |
`…-sequential` |
| `blocktests-stable-race-{pre-cancun,cancun,prague,osaka}` |
`…-sequential` |

Unchanged shards (single-mode by design):
- `statetests-stable`, `statetests-devnet`
- `blocktests-devnet`, `blocktests-devnet-race-amsterdam` (always
parallel for the in-development hardfork)

### Runner script (`tools/run-eest-spec-test.sh`)

- Always exports `ERIGON_EXEC3_PARALLEL=true|false` based on the
manifest's `exec3-parallel` field (previously only set it when `true`).
The shard's mode is now pinned regardless of the runtime default.
- Strips both `-parallel` and `-sequential` suffixes when computing
`shard_route`, so the new names reuse the existing case-arms (fixture
path / `--run` regex) without duplicated routing logic.

### Updated `max-allowed-failures`

Re-ran every shard with a non-zero budget against current HEAD and
tightened budgets to the exact observed counts (no buffer):

| Shard | Tests | Old | New |
|---|---:|---:|---:|
| blocktests-stable-parallel | 69,256 | 20 | **2** |
| blocktests-devnet | 82,896 | 25 | **5** |
| enginextests-stable-parallel | 63,920 | 19 | **11** |
| blocktests-stable-race-pre-cancun-parallel | 8,947 | 259 | **2** |
| blocktests-stable-race-cancun-parallel | 17,783 | 140 | **0** |
| blocktests-stable-race-prague-parallel | 20,945 | 136 | **0** |
| blocktests-stable-race-osaka-parallel | 21,564 | 136 | **0** |
| blocktests-devnet-race-amsterdam | 21,328 | 60 | **3** |

The three stable-race `cancun/prague/osaka-parallel` shards now
strict-gate at 0 failures.

### Skill docs

Updated four `.claude/skills/*` files to reflect the new make-target
names and added a "stale-binary pitfall" note explaining that `make
eest-spec-&lt;shard&gt;` always rebuilds `evm` / `evm.race` (via the Makefile
prereq + Go's content-addressed build cache), but invoking `bash
tools/run-eest-spec-test.sh &lt;shard&gt;` directly **bypasses** that — easy
footgun when comparing failure counts across commits.

- `.claude/skills/erigon-test-all/SKILL.md`
- `.claude/skills/erigon-implement-eip/SKILL.md`
- `.claude/skills/erigon-test-race/SKILL.md` (also corrected an outdated
claim that EEST race coverage needed manual `GOFLAGS='-race'`)
- `.claude/skills/erigon-ci/SKILL.md`

## Test plan

- [ ] CI `EEST spec tests` workflow green on this branch (matrix is
loaded from the renamed manifest)
- [ ] All renamed make targets exist: `for s in $(jq -r '.[].shard'
tools/eest-spec-shards.json); do make -n eest-spec-$s &gt;/dev/null || echo
"MISSING: $s"; done`
- [ ] JSON parses + every shard routes to an existing case-arm (manual
verification via `shard_route` simulation)
- [ ] Manual local run of one `-sequential` and one `-parallel` shard
confirms `ERIGON_EXEC3_PARALLEL` is set as expected (script log shows
`[env] ERIGON_EXEC3_PARALLEL true|false`)
diff --git a/.claude/skills/erigon-ci/SKILL.md b/.claude/skills/erigon-ci/SKILL.md
@@ -22,14 +22,16 @@ Each test group has its own dedicated skill for drill-down on failures. Use thos
 | unit | `erigon-test-unit` | `make test-short` | ~5 min | Pre-push gate |
 | all | `erigon-test-all` | `GOGC=80 make test-all` | ~30 min | Before PR review |
 | race | `erigon-test-race` | `make test-all-race` | ~60 min | Concurrency changes |
-| eest-spec | *(inline)* | `make eest-spec-<suite>-<fixtures>` | varies | EEST state/blockchain/engine-x changes |
+| eest-spec | *(inline)* | `make eest-spec-<suite>-<fixtures>[-{sequential,parallel}]` | varies | EEST state/blockchain/engine-x changes (most shards split into `-sequential` / `-parallel` pairs that pin `ERIGON_EXEC3_PARALLEL`; see `tools/eest-spec-shards.json`) |
 | caplin spec | *(inline)* | `cd cl/spectest && make tests && make mainnet` | ~15 min | CL/consensus changes |
 | hive | `erigon-test-hive` | `make test-hive` | ~20 min | EL/CL interop changes |
 | rpc | `erigon-test-rpc` | *(requires synced DB)* | ~10 min | RPC API changes |
 | assertoor | *(remote only)* | dispatch only | — | Kurtosis network test |
 
 `make test-all` no longer covers the EEST spec tests or the `cl/spectest` consensus spec test — run those via their dedicated targets above when relevant.
 
+**Pitfall:** prefer `make eest-spec-<shard>` over `bash tools/run-eest-spec-test.sh <shard>`. The make target lists `evm` / `evm.race` as a prereq so stale binaries get rebuilt; the script invoked directly does not, and a stale binary against current fixtures will inflate failures or hide regressions.
+
 ### Lint (run first — non-deterministic, may need multiple runs)
 ```bash
 make lint
diff --git a/.claude/skills/erigon-implement-eip/SKILL.md b/.claude/skills/erigon-implement-eip/SKILL.md
@@ -116,14 +116,17 @@ Run local tests using the `/erigon-test-all` skill. Analyse and fix any failures
 The most important tests when implementing a new EIP for the EL are the EEST spec test shards, exercised by the `cmd/evm` runners (`statetest`, `blocktest`, `enginextest`) via the Makefile targets:
 
 - `make eest-spec-statetests-stable` / `…-devnet` — state-tests against the stable/devnet EEST fixtures
-- `make eest-spec-blocktests-stable` / `…-devnet` — blockchain-tests against the stable/devnet EEST fixtures. The devnet shard always runs under `ERIGON_EXEC3_PARALLEL=true` (the in-development hardfork requires it); the stable shard runs serial exec3.
-- `make eest-spec-blocktests-stable-parallel` — same as `…-stable` but with `ERIGON_EXEC3_PARALLEL=true`; useful for catching parallel-only regressions on stable fixtures.
-- `make eest-spec-enginextests-stable` — engine-x tests against the stable EEST fixtures (no devnet variant: the devnet tarball doesn't yet ship `blockchain_tests_engine_x/`).
-- `make eest-spec-enginextests-benchmark-1m` / `-5m` / `-10m` / `-30m` / `-60m` / `-100m` / `-150m` — engine-x tests against the per-gas-target benchmark fixtures, with `--time` per-test stats.
-- `make eest-spec-blocktests-stable-race-{pre-cancun,cancun,prague,osaka}` and `make eest-spec-blocktests-devnet-race-amsterdam` — race-detector variants split by fork. Each stable-race sub-shard also has a `-parallel` sibling (e.g. `…-race-cancun-parallel`) that exercises parallel exec3 under the race detector. The `blocktests-devnet-race-amsterdam` shard is always parallel (matches the non-race devnet behaviour).
+- `make eest-spec-blocktests-stable-sequential` / `…-devnet` — blockchain-tests against the stable/devnet EEST fixtures. The devnet shard always runs under `ERIGON_EXEC3_PARALLEL=true` (the in-development hardfork requires it); the `…-sequential` shard pins `ERIGON_EXEC3_PARALLEL=false`.
+- `make eest-spec-blocktests-stable-parallel` — same fixtures as `…-stable-sequential` but with `ERIGON_EXEC3_PARALLEL=true`; useful for catching parallel-only regressions on stable fixtures.
+- `make eest-spec-enginextests-stable-sequential` — engine-x tests against the stable EEST fixtures with `ERIGON_EXEC3_PARALLEL=false`. No devnet variant: the devnet tarball doesn't yet ship `blockchain_tests_engine_x/`.
+- `make eest-spec-enginextests-stable-parallel` — same fixtures as `…-stable-sequential` but with `ERIGON_EXEC3_PARALLEL=true`; useful for catching parallel-only regressions on engine-x stable fixtures.
+- `make eest-spec-enginextests-benchmark-{1m,5m,10m,30m,60m,100m,150m}-{sequential,parallel}` — engine-x tests against the per-gas-target benchmark fixtures, with `--time` per-test stats. Each gas target has a `-sequential` (`ERIGON_EXEC3_PARALLEL=false`) and `-parallel` (`ERIGON_EXEC3_PARALLEL=true`) variant.
+- `make eest-spec-blocktests-stable-race-{pre-cancun,cancun,prague,osaka}-{sequential,parallel}` and `make eest-spec-blocktests-devnet-race-amsterdam` — race-detector variants split by fork. Each stable-race sub-shard has a `-sequential` / `-parallel` pair; the `-parallel` siblings exercise parallel exec3 under the race detector. The `blocktests-devnet-race-amsterdam` shard is always parallel (matches the non-race devnet behaviour).
 
 The shard list / failure budgets / `exec3-parallel` flags are defined in `tools/eest-spec-shards.json` (single source of truth shared with the CI workflow and the local runner script). See `EEST_SPEC_SHARDS` / `EEST_SPEC_RACE_SHARDS` in the root `Makefile` for the partition into non-race vs race targets.
 
+**Pitfall: stale `evm` / `evm.race` binary.** When iterating on an EIP implementation, always invoke shards via `make eest-spec-<shard>` rather than `bash tools/run-eest-spec-test.sh <shard>` — the make target lists `evm` (or `evm.race`) as a prereq and `go build` is cache-aware, so a fresh binary is built before each run. The script invoked directly **bypasses** that rebuild, so the runners exercise whatever `build/bin/evm{,.race}` happens to be on disk against current fixtures — silently inflating failures (e.g. devnet shards "regressing" by thousands of tests) or hiding regressions when comparing budgets before/after a change.
+
 For an EIP on a hardfork under development, the **`-devnet`** shards are the primary signal; the `-stable` shards (and `-benchmark-*` shards, where applicable) are regression checks against prior hardforks.
 
 ### Where the test fixtures come from
diff --git a/.claude/skills/erigon-test-all/SKILL.md b/.claude/skills/erigon-test-all/SKILL.md
@@ -15,23 +15,30 @@ To exercise the EEST suites locally, see `erigon-eest-spec` (or run a specific s
 
 ```bash
 make eest-spec-statetests-stable             # state tests vs eest_stable fixtures
-make eest-spec-blocktests-stable             # blockchain tests vs eest_stable fixtures (serial exec3)
+make eest-spec-blocktests-stable-sequential  # blockchain tests vs eest_stable fixtures (ERIGON_EXEC3_PARALLEL=false)
 make eest-spec-blocktests-stable-parallel    # same, but with ERIGON_EXEC3_PARALLEL=true
-make eest-spec-enginextests-stable           # engine-x tests vs eest_stable fixtures
+make eest-spec-enginextests-stable-sequential # engine-x tests vs eest_stable (ERIGON_EXEC3_PARALLEL=false)
+make eest-spec-enginextests-stable-parallel  # same, but with ERIGON_EXEC3_PARALLEL=true
 make eest-spec-statetests-devnet             # …vs eest_devnet fixtures
 make eest-spec-blocktests-devnet             # devnet blocktests (always parallel exec3)
-make eest-spec-enginextests-benchmark-1m     # engine-x benchmark fixtures @ 1M gas target
+make eest-spec-enginextests-benchmark-1m-sequential
+                                             # engine-x benchmark fixtures @ 1M gas target
                                              # (with per-test --time stats);
-                                             # -5m/-10m/-30m/-60m/-100m/-150m variants too
-make eest-spec-blocktests-stable-race-cancun # race-detector variant, sharded per fork:
+                                             # -5m/-10m/-30m/-60m/-100m/-150m variants too,
+                                             # each with a "-sequential" / "-parallel" pair
+make eest-spec-blocktests-stable-race-cancun-sequential
+                                             # race-detector variant, sharded per fork:
                                              # -pre-cancun/-cancun/-prague/-osaka, plus
-                                             # eest-spec-blocktests-devnet-race-amsterdam
-                                             # each also has a "-parallel" sibling
-                                             # (e.g. ...-race-cancun-parallel)
+                                             # eest-spec-blocktests-devnet-race-amsterdam.
+                                             # Each stable-race sub-shard has a
+                                             # "-sequential" / "-parallel" pair
+                                             # (e.g. ...-race-cancun-{sequential,parallel})
 ```
 
 The shard list / failure budgets / `exec3-parallel` flags live in `tools/eest-spec-shards.json` (single source of truth for both this workflow and `tools/run-eest-spec-test.sh`). See `EEST_SPEC_SHARDS` / `EEST_SPEC_RACE_SHARDS` in the root `Makefile` for the partition into race vs non-race targets.
 
+**Pitfall: stale `evm` / `evm.race` binary.** Always invoke shards via `make eest-spec-<shard>` — the Makefile lists `evm` (or `evm.race`) as a prereq and `go build` is cache-aware, so a stale binary gets rebuilt automatically. Calling `bash tools/run-eest-spec-test.sh <shard>` directly **bypasses** the rebuild and silently exercises whatever `build/bin/evm{,.race}` happens to be on disk against current fixtures, inflating failures or hiding regressions. After pulling code, switching branches, or any time you suspect the binary is older than HEAD: `rm -f build/bin/evm build/bin/evm.race && make evm evm.race` before re-running.
+
 Two side prerequisites still apply for tests `make test-all` does run:
 
 ```bash
diff --git a/.claude/skills/erigon-test-race/SKILL.md b/.claude/skills/erigon-test-race/SKILL.md
@@ -11,7 +11,9 @@ Runs the full test suite with Go's `-race` flag. Catches concurrency bugs that n
 
 `make test-all-race` no longer downloads any fixture tarballs. EEST spec tests (state/blockchain/engine-x) moved out of `go test ./...` and into the dedicated `eest-spec-*` Makefile targets driven by the **EEST spec tests** workflow (`test-eest-spec.yml`); the consensus spec test (`cl/spectest`) is skipped here via `ERIGON_SKIP_CL_SPECTEST=true` (set automatically by the Makefile) and runs only in `test-integration-caplin.yml`.
 
-If you want race coverage on the EEST or consensus spec suites specifically, run them via their dedicated targets (those don't apply `-race` automatically — pass `GOFLAGS='-race'` or invoke `go test -race` against the relevant package directly).
+If you want race coverage on the EEST blocktests, use the dedicated race shards — `make eest-spec-blocktests-stable-race-{pre-cancun,cancun,prague,osaka}-{sequential,parallel}` and `make eest-spec-blocktests-devnet-race-amsterdam`. These build a race-instrumented `evm.race` binary automatically (see `EEST_SPEC_RACE_SHARDS` in the root `Makefile`); the `-sequential` / `-parallel` split pins `ERIGON_EXEC3_PARALLEL` so race coverage hits both modes. For the consensus spec suite or other Go packages, pass `GOFLAGS='-race'` or invoke `go test -race` against the relevant package directly.
+
+**Pitfall: stale `evm.race` binary.** `make eest-spec-<race-shard>` lists `evm.race` as a prereq and `go build` is cache-aware, so a stale binary gets rebuilt. Calling `bash tools/run-eest-spec-test.sh <shard>` directly with `EVM_BIN=build/bin/evm.race` **bypasses** the rebuild and silently runs an old race-instrumented binary against current fixtures — race reports against code that no longer exists, missed races against code that does. After pulling or switching branches: `rm -f build/bin/evm.race && make evm.race` before re-running.
 
 Two side prerequisites still apply for tests `make test-all-race` does run:
 
diff --git a/tools/eest-spec-shards.json b/tools/eest-spec-shards.json
@@ -10,29 +10,35 @@
     "max-allowed-failures": 0
   },
   {
-    "shard": "blocktests-stable",
+    "shard": "blocktests-stable-sequential",
     "workers": 12,
     "max-allowed-failures": 0
   },
   {
     "shard": "blocktests-stable-parallel",
     "workers": 12,
-    "max-allowed-failures": 20,
+    "max-allowed-failures": 2,
     "exec3-parallel": true
   },
   {
     "shard": "blocktests-devnet",
     "workers": 12,
-    "max-allowed-failures": 25,
+    "max-allowed-failures": 5,
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-stable",
+    "shard": "enginextests-stable-sequential",
     "workers": 8,
     "max-allowed-failures": 0
   },
   {
-    "shard": "enginextests-benchmark-1m",
+    "shard": "enginextests-stable-parallel",
+    "workers": 8,
+    "max-allowed-failures": 11,
+    "exec3-parallel": true
+  },
+  {
+    "shard": "enginextests-benchmark-1m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -43,7 +49,7 @@
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-benchmark-5m",
+    "shard": "enginextests-benchmark-5m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -54,7 +60,7 @@
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-benchmark-10m",
+    "shard": "enginextests-benchmark-10m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -65,7 +71,7 @@
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-benchmark-30m",
+    "shard": "enginextests-benchmark-30m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -76,7 +82,7 @@
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-benchmark-60m",
+    "shard": "enginextests-benchmark-60m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -87,7 +93,7 @@
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-benchmark-100m",
+    "shard": "enginextests-benchmark-100m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -98,7 +104,7 @@
     "exec3-parallel": true
   },
   {
-    "shard": "enginextests-benchmark-150m",
+    "shard": "enginextests-benchmark-150m-sequential",
     "workers": 1,
     "max-allowed-failures": 0
   },
@@ -109,53 +115,53 @@
     "exec3-parallel": true
   },
   {
-    "shard": "blocktests-stable-race-pre-cancun",
+    "shard": "blocktests-stable-race-pre-cancun-sequential",
     "workers": 12,
     "max-allowed-failures": 0
   },
   {
-    "shard": "blocktests-stable-race-cancun",
+    "shard": "blocktests-stable-race-cancun-sequential",
     "workers": 12,
     "max-allowed-failures": 0
   },
   {
-    "shard": "blocktests-stable-race-prague",
+    "shard": "blocktests-stable-race-prague-sequential",
     "workers": 12,
     "max-allowed-failures": 0
   },
   {
-    "shard": "blocktests-stable-race-osaka",
+    "shard": "blocktests-stable-race-osaka-sequential",
     "workers": 12,
     "max-allowed-failures": 0
   },
   {
     "shard": "blocktests-stable-race-pre-cancun-parallel",
     "workers": 12,
-    "max-allowed-failures": 259,
+    "max-allowed-failures": 2,
     "exec3-parallel": true
   },
   {
     "shard": "blocktests-stable-race-cancun-parallel",
     "workers": 12,
-    "max-allowed-failures": 140,
+    "max-allowed-failures": 0,
     "exec3-parallel": true
   },
   {
     "shard": "blocktests-stable-race-prague-parallel",
     "workers": 12,
-    "max-allowed-failures": 136,
+    "max-allowed-failures": 0,
     "exec3-parallel": true
   },
   {
     "shard": "blocktests-stable-race-osaka-parallel",
     "workers": 12,
-    "max-allowed-failures": 136,
+    "max-allowed-failures": 0,
     "exec3-parallel": true
   },
   {
     "shard": "blocktests-devnet-race-amsterdam",
     "workers": 12,
-    "max-allowed-failures": 60,
+    "max-allowed-failures": 3,
     "exec3-parallel": true
   }
 ]
diff --git a/tools/run-eest-spec-test.sh b/tools/run-eest-spec-test.sh
@@ -7,18 +7,18 @@
 #
 #   statetests-stable                          state tests vs. eest_stable
 #   statetests-devnet                          state tests vs. eest_devnet
-#   blocktests-stable                          blockchain tests vs. eest_stable
+#   blocktests-stable-sequential               blockchain tests vs. eest_stable
 #   blocktests-devnet                          blockchain tests vs. eest_devnet
-#   enginextests-stable                        engine-x tests vs. eest_stable
-#   enginextests-benchmark-{1m,5m,10m,30m,60m,100m,150m}
+#   enginextests-stable-sequential             engine-x tests vs. eest_stable
+#   enginextests-benchmark-{1m,5m,10m,30m,60m,100m,150m}-sequential
 #                                              engine-x benchmark fixtures per
 #                                              gas-target subdir; each value
 #                                              maps to one for_osaka_at_<NNNN>M/
 #                                              directory under the engine_x
 #                                              benchmark fixtures
-#   blocktests-stable-race-{pre-cancun,cancun,prague,osaka}
+#   blocktests-stable-race-{pre-cancun,cancun,prague,osaka}-sequential
 #                                              race-detector variant of
-#                                              blocktests-stable, split by
+#                                              blocktests-stable-sequential, split by
 #                                              fork via the --run regex so
 #                                              each sub-shard fits under ~30
 #                                              min. Caller (Makefile / CI) must
@@ -28,8 +28,13 @@
 #   blocktests-devnet-race-amsterdam           race-detector variant filtered
 #                                              to the Amsterdam fork only.
 #   *-parallel                                  any of the above with "-parallel"
-#                                              appended sets ERIGON_EXEC3_PARALLEL=true
-#                                              from the manifest entry.
+#                                              appended runs with
+#                                              ERIGON_EXEC3_PARALLEL=true. Every
+#                                              other shard runs with
+#                                              ERIGON_EXEC3_PARALLEL=false so
+#                                              the runtime default in
+#                                              dbg.Exec3Parallel can flip without
+#                                              redefining the shards.
 #
 # Each shard maps to one cmd/evm subcommand running with --jsonout. Pass/fail
 # is decided here (not by the binary, which always exits 0): the shard fails
@@ -71,14 +76,17 @@ if [[ -z "$budget_row" ]]; then
 	exit 2
 fi
 IFS=$'\t' read -r default_workers default_max exec3_parallel <<<"$budget_row"
-if [[ "$exec3_parallel" == "true" ]]; then
-	export ERIGON_EXEC3_PARALLEL=true
-fi
+# Always set ERIGON_EXEC3_PARALLEL explicitly (true or false) so the shard's
+# behaviour is pinned to the manifest, independent of whatever dbg.Exec3Parallel
+# defaults to at runtime. If the default flips, the shards still run the mode
+# they were defined for.
+export ERIGON_EXEC3_PARALLEL="$exec3_parallel"
 
-# Strip "-parallel" suffix for case-arm routing — the parallel variant has the
-# same fixture path / regex as the non-parallel parent shard; only the
+# Strip "-parallel" / "-sequential" suffix for case-arm routing — both variants
+# share the same fixture path / regex as the parent shard; only the
 # ERIGON_EXEC3_PARALLEL env var differs.
 shard_route="${shard%-parallel}"
+shard_route="${shard_route%-sequential}"
 
 # Per-shard structural config (cmd / fixture path / extra CLI flags). Match
 # against shard_route so "-parallel" variants reuse the same arm as their
@@ -177,7 +185,10 @@ if (( total == 0 )); then
 fi
 if (( failed > max )); then
 	echo "ERROR: $failed failures exceed max-allowed $max" >&2
-	jq -r '.[] | select(.pass == false) | "  FAIL " + .name' "$result_file" >&2
+	# Emit the cmd/evm `error` field per failing test, not just the name,
+	# so transient CI flakes are diagnosable from the job log alone (the
+	# raw JSON output is dropped after this script exits).
+	jq -r '.[] | select(.pass == false) | "  FAIL " + .name + "\n        error: " + (.error // "<no error message>")' "$result_file" >&2
 	exit 1
 fi
 exit 0