Do not merge, non-draft for workflow testing: ci: fix test caching, use consistent Go version#19223
Do not merge, non-draft for workflow testing: ci: fix test caching, use consistent Go version#19223
Conversation
82d5583 to
fc7f324
Compare
abf2038 to
0d74491
Compare
841b4cf to
769399b
Compare
- Create setup-erigon composite action consolidating Go setup, cleanup-space, mtime normalization, and dependency installation - Add restore/save-build-cache and save-mod-cache actions with content-hashed keys - Split cache into restore/save steps so partial results are saved on failure - Normalize directory mtimes with fixed timestamp for Go test cache stability - Shallow submodule clones and selective LFS pulls in test-all-erigon - Optimize race test checkout, use per-group build cache tags - Extract test output filter into tools/filter-test-output - Streamline lint targets (fast-only + full for local, full-only for CI) - Export CGO_CXXFLAGS from Makefile, remove redundant per-command forwarding - Break out reproducible build and SonarCloud into own workflows - Combine test jobs into single multi-OS matrix - Derive Go version from go.mod, include in cache keys - Drop GOEXPERIMENT=synctest (standard in go1.25) - Remove test-win-downloader workflow - Add workflow_dispatch to workflows missing it - Document workflow triggering, required checks, and merge queues in .github/README.md - Refactor CLAUDE.md into modular rules files - execution/tests: use short temp dir names to avoid macOS path limit
Consolidate duplicate assertoor_regular_test and assertoor_pectra_test jobs into a single matrix job, reducing duplication.
…git history Move checkout, submodule clone, and LFS pull into the setup-erigon composite action with configurable inputs. This removes duplication across test-all-erigon, test-all-erigon-race, sonar, lint, and consensus spec workflows. Replace the fixed-epoch mtime normalization with git-restore-mtime for the main repo (using full history) and commit-timestamp touch for submodules (depth 1). This ensures Go's test cache detects fixture file changes via mtime, rather than masking them with a static timestamp. Directory mtimes are still normalized to a fixed epoch since git-restore-mtime only updates dirs for file adds/deletes, not modifications.
Replace github.event.before (null SHA on new branches, empty on PR opened/reopened) with git rev-parse HEAD~1 from local history. Replace content hash in save key with run_id. Add tier for current commit from a previous run to support re-runs.
github.workflow resolves to the workflow's name: field which is fragile if renamed. Extract the filename from GITHUB_WORKFLOW_REF instead, giving stable cache keys like gocache-test-all-erigon-... instead of gocache-All tests-...
Add a cleanup-erigon composite action that parses run.log for Go test cache hit rates and writes a summary table to GITHUB_STEP_SUMMARY. Tee filtered test output to run.log in all test workflows.
Resolved merge conflict between our matrix consolidation of assertoor_test/assertoor_pectra_test into a single job and main's addition of third-party container caching and BuildKit layer caching (PR #19493). Keep our matrix strategy; adopt main's caching steps.
All groups now rely on the job-level timeout-minutes rather than per-group go test -timeout values.
write-test-groups reads all packages from stdin (go list ./...) and partitions them into named groups using an ordered filter pipeline. Each group's spec is written to tools/test-groups/<name>; if a spec's full package set is unclaimed at that point the raw spec is written, otherwise the resolved package paths are. tools/test-groups.json is also written with the ordered list of group names. --json flag emits only the group name list to stdout with no Go tooling required, used by the load-matrix workflow job. Makefile: write-test-groups recipe; test-group reads package list from the generated file via cat. test-all-erigon-race: load-matrix job runs write-test-groups --json and feeds the result into the matrix, replacing the hardcoded group list. tools/test-groups/ and tools/test-groups.json are gitignored.
There was a problem hiding this comment.
Pull request overview
Refactors CI workflows to improve Go test/build caching and reduce redundant setup across jobs, while reorganizing workflows and test grouping utilities.
Changes:
- Added composite actions for standardized Go setup and explicit restore/save caching of Go build + module caches.
- Split and reorganized CI workflows (tests, race, sonar, reproducible build), adding matrix-based grouping and output filtering.
- Refactored execution test utilities by moving matcher logic into
execution/tests/testutiland updating tests accordingly.
Reviewed changes
Copilot reviewed 50 out of 51 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/write-test-groups | New script to partition Go packages into named CI test groups and write group files/JSON. |
| tools/filter-test-output | New bash filter to reduce noisy go test output while preserving failure exit codes. |
| sonar-project.properties | Excludes .github/** from Sonar analysis. |
| node/interfaces | Updates submodule pointer. |
| execution/tests/transaction_test.go | Switches to testutil.TestMatcher API. |
| execution/tests/testutil/matcher.go | Adds shared TestMatcher implementation and JSON test file walking/loading. |
| execution/tests/state_test.go | Migrates to testutil.TestMatcher and changes temp dir handling. |
| execution/tests/rlp_test.go | Switches to testutil.TestMatcher API. |
| execution/tests/invalid_receipt_hash_high_mgas_test.go | Switches to testutil.TestMatcher API. |
| execution/tests/init_test.go | Removes old matcher/JSON helpers; uses testutil.TestMatcher. |
| execution/tests/eest_prague_calldata/block_test.go | Adds isolated EEST Prague calldata blockchain test package. |
| execution/tests/eest_osaka_clz/block_test.go | Adds isolated EEST Osaka CLZ blockchain test package. |
| execution/tests/eest_frontier_scenarios/block_test.go | Adds isolated EEST Frontier scenarios blockchain test package. |
| execution/tests/eest_cancun_blobs/block_test.go | Adds isolated EEST Cancun blobs blockchain test package. |
| execution/tests/difficulty_test.go | Switches to testutil.TestMatcher API. |
| execution/tests/block_test.go | Switches to testutil.TestMatcher and adjusts skip/slow handling. |
| execution/tests/benchmark_engine_x_test.go | Switches to testutil.TestMatcher and updates parallel control. |
| cl/spectest/Makefile | Adjusts go test invocation flags for consensus spec tests. |
| agents.md | Adds workflow guidance note. |
| TODO | Adds CI/workflow-related TODOs. |
| Makefile | Exports CGO_CXXFLAGS, refactors test filtering, adds test-group helpers, and updates lint targets. |
| .gitignore | Ignores generated test group files. |
| .github/workflows/test-win-downloader.yml | Removes unused workflow. |
| .github/workflows/test-kurtosis-assertoor.yml | Collapses regular/pectra jobs into a matrix; adds skip label logic. |
| .github/workflows/test-integration-caplin.yml | Uses setup/cleanup composite actions and filters output. |
| .github/workflows/test-hive.yml | Adds skip label gate for “uncaching” workflows. |
| .github/workflows/test-all-erigon.yml | Reworks test workflow to use setup/cleanup composite actions and matrix. |
| .github/workflows/test-all-erigon-race.yml | Builds dynamic test-group matrix using write-test-groups; uses composite caching. |
| .github/workflows/sonar.yml | New dedicated Sonar workflow using composite setup/cleanup. |
| .github/workflows/reusable-release-build-debian-pkg.yml | Makes apt output quieter. |
| .github/workflows/reproducible-build.yml | New reproducible build workflow across OS matrix. |
| .github/workflows/qa-rpc-performance-tests.yml | Uses partial clone filter (blob:none). |
| .github/workflows/qa-rpc-performance-comparison-tests.yml | Uses partial clone filter (blob:none). |
| .github/workflows/manifest.yml | Derives Go version from go.mod. |
| .github/workflows/lint.yml | Reworks triggers, caching, and uses composite setup/cleanup. |
| .github/workflows/docker-tags.yml | Uses partial clone filter (blob:none). |
| .github/workflows/ci.yml | Removes monolithic legacy CI workflow. |
| .github/test-caching | Adds helper script to rerun workflows to warm caches on PRs. |
| .github/actions/setup-erigon/action.yml | New composite action: checkout strategy, mtime normalization, Go setup, mod/build cache restore. |
| .github/actions/save-mod-cache/action.yml | New composite action to hash and save Go module download cache content-addressably. |
| .github/actions/save-build-cache/action.yml | New composite action to save Go build cache with workflow/job/scoped keys. |
| .github/actions/restore-build-cache/action.yml | New composite action to restore Go build cache with tiered fallbacks. |
| .github/actions/cleanup-erigon/action.yml | New composite action to save caches and warn about uncached tests. |
| .github/README.md | Adds documentation for required checks, merge queues, triggers, and caching strategy. |
| .github/PROPOSAL.md | Adds CI/workflow proposal notes. |
| .gitattributes | Forces LF for go.sum to stabilize hashFiles('go.sum') cache keys across OSes. |
| .claude/settings.json | Expands allowed tools/actions for Claude config. |
| .claude/rules/lint.md | Adds lint rules doc. |
| .claude/rules/conventions.md | Adds commit message conventions doc. |
| .claude/rules/build-and-test.md | Adds build/test instructions doc. |
| .claude/rules/architecture.md | Adds architecture overview doc. |
Comments suppressed due to low confidence (6)
execution/tests/testutil/matcher.go:1
readJSONFilepasses&valuetojson.Unmarshal, but callers already pass a pointer (e.g.m.Addr().Interface()). Unmarshaling into*interface{}will populate the interface variable rather than the pointed-to map, leaving the map empty. Passvaluedirectly tojson.Unmarshalso it fills the provided pointer target.
execution/tests/testutil/matcher.go:1CheckFailureWithNameignores itsnameparameter and matches patterns againstt.Name()instead. This breaks callers that want to evaluate expected failures for a different logical test name (e.g., aggregated names or file/key names). Usenamefor the match instead oft.Name().
execution/tests/testutil/matcher.go:1- If
os.Stat(dir)returns an error other thanIsNotExist,dirinfowill be nil anddirinfo.IsDir()will panic. Handleerr != nilfirst (skip only on not-exist; otherwise fail the test), and only callIsDir()whenerr == nil.
tools/filter-test-output:1 - The script only checks
PIPESTATUS[1](the second grep) and ignores failures from the first grep; it also documents SIGPIPE (141) but currently exits 141, which can fail callers that intentionally truncate output. Consider: (1) validating both grep exit codes and returning 2 if either grep errors, and (2) treating 141 like non-fatal for a filter (similar to exit 1 handling).
cl/spectest/Makefile:1 - These targets previously passed
-tags=spectestbut no longer do. If any spec tests are gated behind//go:build spectest(or similar), this will silently skip compiling/running them. Either restore-tags=spectesthere or ensure the tag is provided viaGOFLAGS/environment consistently.
agents.md:1 - Use the standard hyphenation 'cross-platform' in documentation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if: ${{ inputs.submodules == 'true' }} | ||
| shell: bash | ||
| run: | | ||
| hash=$(git ls-tree -r HEAD | awk '$2 == "commit" {print $3}' | sort | sha256sum | cut -d' ' -f1) |
There was a problem hiding this comment.
sha256sum is not available by default on macOS runners (they typically provide shasum -a 256). Since this action is used by macOS workflows, this step can fail. Use a portable hashing fallback (e.g., prefer sha256sum else shasum -a 256) consistently anywhere hashing is done in composite actions.
| hash=$(git ls-tree -r HEAD | awk '$2 == "commit" {print $3}' | sort | sha256sum | cut -d' ' -f1) | |
| if command -v sha256sum >/dev/null 2>&1; then | |
| hash_cmd="sha256sum" | |
| else | |
| hash_cmd="shasum -a 256" | |
| fi | |
| hash=$(git ls-tree -r HEAD | awk '$2 == "commit" {print $3}' | sort | ${hash_cmd} | cut -d' ' -f1) |
| if: ${{ always() }} | ||
| shell: bash | ||
| run: | | ||
| hash=$(cd '${{ inputs.modcache-path }}/cache/download' && find . -type d -name '*@*' | sort | sha256sum | cut -d' ' -f1) |
There was a problem hiding this comment.
Same portability issue as elsewhere: sha256sum is not guaranteed on macOS/Windows bash environments. Since this cache action may run on macOS jobs, replace sha256sum usage with a portable hash helper (fallback to shasum -a 256).
| hash=$(cd '${{ inputs.modcache-path }}/cache/download' && find . -type d -name '*@*' | sort | sha256sum | cut -d' ' -f1) | |
| portable_sha256() { | |
| if command -v sha256sum >/dev/null 2>&1; then | |
| sha256sum "$@" | |
| elif command -v shasum >/dev/null 2>&1; then | |
| shasum -a 256 "$@" | |
| else | |
| echo "Error: no SHA-256 checksum utility found (sha256sum or shasum)" >&2 | |
| return 1 | |
| fi | |
| } | |
| hash=$(cd '${{ inputs.modcache-path }}/cache/download' && find . -type d -name '*@*' | sort | portable_sha256 | cut -d' ' -f1) |
| - name: Install git-restore-mtime | ||
| shell: bash | ||
| run: | | ||
| curl -sL https://raw.githubusercontent.com/MestreLion/git-tools/main/git-restore-mtime -o "$RUNNER_TEMP/git-restore-mtime" |
There was a problem hiding this comment.
This downloads and executes a script directly from raw.githubusercontent.com without pinning to a commit SHA or verifying a checksum. To reduce supply-chain risk, pin the URL to an immutable commit (or vendor the script into the repo) and/or verify an expected checksum before executing.
| curl -sL https://raw.githubusercontent.com/MestreLion/git-tools/main/git-restore-mtime -o "$RUNNER_TEMP/git-restore-mtime" | |
| # Pin git-restore-mtime to a specific commit to avoid running mutable code from "main" | |
| GIT_RESTORE_MTIME_URL="https://raw.githubusercontent.com/MestreLion/git-tools/8dfb1b78c57d20593d3536bc3a1e9bdfb1ef82b4/git-restore-mtime" | |
| curl -sL "$GIT_RESTORE_MTIME_URL" -o "$RUNNER_TEMP/git-restore-mtime" |
| - fixes teh issue of rerunning tests after PR is merged | ||
| - runs tests as tho they were on the merge target, and become the new commit |
There was a problem hiding this comment.
This doc has a few typos ('teh', 'tho', 'accomodate') that reduce readability/searchability. Please correct them.
| - caching for fixtures is per-package, so breaking up big packages for different fixtures fixes this | ||
|
|
||
| timeouts: | ||
| - aim for individual job limits of 30 minutes (cold). real target is 10 mins average. this is not to catch overuse of runners, but to ensure tests evolve to be parallelizable as we go (PRs that make them bigger should improve the workflow to accomodate if required). |
There was a problem hiding this comment.
This doc has a few typos ('teh', 'tho', 'accomodate') that reduce readability/searchability. Please correct them.
Colon-space in 'matched key: ...' inside a plain scalar run: value is invalid YAML. Switch to a block scalar.
Replace the complex branch/SHA/parent-commit restore-key logic with a simple run_id|run_attempt suffix. Each save key is now inherently unique, making the restore strategy straightforward (same run previous attempt, then any prior run with matching workflow/job/extra-key/goversion/os/arch). The primary key is computed once in restore-build-cache and exported as ERIGON_CI_GOCACHE_KEY, eliminating key format duplication between restore and save. save-build-cache now takes a single `key` input instead of reconstructing the key from component parts. Also renames ERIGON_CI_GOCACHE -> ERIGON_CI_GOCACHE_PATH for clarity.
Caching now works properly, so the dorny/paths-filter gate that skipped tests for dashboard-only changes is no longer needed. Also removes two dead env var exports (ERIGON_CI_GOVERSION, ERIGON_CI_WORKFLOW) left over from the previous cache key simplification.
The rules files were duplicates of content already in agents.md. Remove them to keep a single source of truth.
- gocache → gocache-path - workflow → workflow-id - extra-key → matrix-key - cache-matched-key → restored-key - cache-key → primary-key
Key format: gomodcache|workflow-id|job|matrix-key|runner.os|go.sum-hash|run_id|run_attempt - Add restore-mod-cache action (parallel to restore-build-cache) - Simplify save-mod-cache to match save-build-cache (key-based, no content hashing) - setup-erigon: remove modcache input and go mod download step; restore in setup, save in cleanup - cleanup-erigon: always save modcache via save-mod-cache using ERIGON_CI_MODCACHE_KEY - lint: remove modcache input (no longer exists)
- Remove tools/test-groups.json from .gitignore and commit it - Add check-test-groups Makefile target; wire into lint and lintci - load-matrix job now cats the committed file instead of invoking the tool
Track the per-group package files in tools/test-groups/ alongside the already-committed tools/test-groups.json. Consolidate mod_tidy_check.sh and check-test-groups into a single check-generated target that runs both go mod tidy and write-test-groups, then verifies no diff.
|
There's about 3 separate PRs here, but I have most CI jobs down to about 3-4 mins. It will be broken up and submitted in separate PRs. |
## Summary Resubmission of #19109 (which was reverted). - Bump minimum Go version from 1.24 to 1.25 across go.mod, CI workflows, Dockerfiles, Makefile, and documentation - Remove dead compatibility code for Go versions prior to 1.25: synctest go1.24 shim, node/debug trace fallback (go<1.5), blake2b SSE4-only fallback (go<1.7) - Strip now-redundant `//go:build` version constraints - Derive hive CI builder Go version from go.mod ## Dependencies Depends on #19223. That PR makes CI workflows derive their Go version from `go.mod` rather than hardcoding it. Without it, merging this PR would leave several workflows pinned to `go-version: '1.24'` or `'1.25'`, which would conflict with the `go.mod` bump to 1.25 here and cause CI failures. --------- Co-authored-by: Alexey Sharov <AskAlexSharov@gmail.com>
Summary
Overhaul CI workflows to improve caching, reduce redundancy, and speed up test runs.
New composite actions
Caching improvements
Workflow restructuring
ci.ymlinto separate workflows:test-all-erigon.yml,reproducible-build.yml,sonar.yml.workflow_dispatchto workflows that were missing it.test-win-downloaderworkflow.Test output filtering
tools/filter-test-outputwith proper exit code handling.Other
go.modinstead of hardcoding.GOEXPERIMENT=synctest(standard in Go 1.25).CGO_CXXFLAGSfrom Makefile instead of per-command forwarding.lint(fast + full, for local) andlintci(full only, for CI)..github/README.md.skip-uncachingPR label to skip Docker-based test workflows (Hive, Kurtosis) that don't use setup-erigon caching.