Skip to content

Do not merge, non-draft for workflow testing: ci: fix test caching, use consistent Go version#19223

Open
anacrolix wants to merge 66 commits intomainfrom
anacrolix/all-tests-caching
Open

Do not merge, non-draft for workflow testing: ci: fix test caching, use consistent Go version#19223
anacrolix wants to merge 66 commits intomainfrom
anacrolix/all-tests-caching

Conversation

@anacrolix
Copy link
Contributor

@anacrolix anacrolix commented Feb 16, 2026

Summary

Overhaul CI workflows to improve caching, reduce redundancy, and speed up test runs.

New composite actions

  • setup-erigon: Consolidates Go setup, cleanup-space, mtime normalization, Windows temp redirect, and dependency installation (build-essential, choco make) into a single reusable action.
  • restore-build-cache / save-build-cache: Content-hashed Go build cache with per-workflow build tags. Saves on failure so partial results aren't lost.
  • save-mod-cache: Separate content-addressed module cache entry.

Caching improvements

  • Normalize all file/directory mtimes to a fixed timestamp so Go's test cache keys are stable across CI runs.
  • Split cache into restore/save steps (instead of save-always) so partial caches are saved even when tests fail.
  • Include Go version and content hashes in cache keys for proper invalidation.
  • Per-test-group build cache tags for race tests.
  • Shallow submodule clones and selective LFS pulls (only files actually needed by tests).

Workflow restructuring

  • Break monolithic ci.yml into separate workflows: test-all-erigon.yml, reproducible-build.yml, sonar.yml.
  • Combine Linux/macOS/Windows test jobs into single multi-OS matrix.
  • Add workflow_dispatch to workflows that were missing it.
  • Remove unused test-win-downloader workflow.

Test output filtering

  • Extract grep filter pipeline from Makefile into tools/filter-test-output with proper exit code handling.
  • Apply filter to race test and caplin test workflows.

Other

  • Derive Go version from go.mod instead of hardcoding.
  • Drop GOEXPERIMENT=synctest (standard in Go 1.25).
  • Export CGO_CXXFLAGS from Makefile instead of per-command forwarding.
  • Split lint (fast + full, for local) and lintci (full only, for CI).
  • Document workflow triggering, required checks, merge queues, and debugging in .github/README.md.
  • Add skip-uncaching PR label to skip Docker-based test workflows (Hive, Kurtosis) that don't use setup-erigon caching.

@anacrolix anacrolix force-pushed the anacrolix/all-tests-caching branch 2 times, most recently from 82d5583 to fc7f324 Compare February 17, 2026 00:22
@anacrolix anacrolix changed the title ci: derive Go version from go.mod, extract SonarCloud workflow ci: fix test caching, use consistent Go version Feb 17, 2026
@anacrolix anacrolix changed the title ci: fix test caching, use consistent Go version Do not merge, non-draft for workflow testing: ci: fix test caching, use consistent Go version Feb 18, 2026
@anacrolix anacrolix marked this pull request as ready for review February 18, 2026 04:14
@anacrolix anacrolix requested a review from domiwei as a code owner February 18, 2026 07:16
@anacrolix anacrolix force-pushed the anacrolix/all-tests-caching branch from abf2038 to 0d74491 Compare February 18, 2026 09:00
@anacrolix anacrolix force-pushed the anacrolix/all-tests-caching branch 5 times, most recently from 841b4cf to 769399b Compare February 22, 2026 03:15
- Create setup-erigon composite action consolidating Go setup, cleanup-space, mtime normalization, and dependency installation
- Add restore/save-build-cache and save-mod-cache actions with content-hashed keys
- Split cache into restore/save steps so partial results are saved on failure
- Normalize directory mtimes with fixed timestamp for Go test cache stability
- Shallow submodule clones and selective LFS pulls in test-all-erigon
- Optimize race test checkout, use per-group build cache tags
- Extract test output filter into tools/filter-test-output
- Streamline lint targets (fast-only + full for local, full-only for CI)
- Export CGO_CXXFLAGS from Makefile, remove redundant per-command forwarding
- Break out reproducible build and SonarCloud into own workflows
- Combine test jobs into single multi-OS matrix
- Derive Go version from go.mod, include in cache keys
- Drop GOEXPERIMENT=synctest (standard in go1.25)
- Remove test-win-downloader workflow
- Add workflow_dispatch to workflows missing it
- Document workflow triggering, required checks, and merge queues in .github/README.md
- Refactor CLAUDE.md into modular rules files
- execution/tests: use short temp dir names to avoid macOS path limit
Consolidate duplicate assertoor_regular_test and assertoor_pectra_test
jobs into a single matrix job, reducing duplication.
…git history

Move checkout, submodule clone, and LFS pull into the setup-erigon
composite action with configurable inputs. This removes duplication
across test-all-erigon, test-all-erigon-race, sonar, lint, and
consensus spec workflows.

Replace the fixed-epoch mtime normalization with git-restore-mtime for
the main repo (using full history) and commit-timestamp touch for
submodules (depth 1). This ensures Go's test cache detects fixture file
changes via mtime, rather than masking them with a static timestamp.
Directory mtimes are still normalized to a fixed epoch since
git-restore-mtime only updates dirs for file adds/deletes, not
modifications.
Replace github.event.before (null SHA on new branches, empty on PR
opened/reopened) with git rev-parse HEAD~1 from local history. Replace
content hash in save key with run_id. Add tier for current commit from
a previous run to support re-runs.
github.workflow resolves to the workflow's name: field which is fragile
if renamed. Extract the filename from GITHUB_WORKFLOW_REF instead, giving
stable cache keys like gocache-test-all-erigon-... instead of
gocache-All tests-...
Add a cleanup-erigon composite action that parses run.log for Go test
cache hit rates and writes a summary table to GITHUB_STEP_SUMMARY. Tee
filtered test output to run.log in all test workflows.
Resolved merge conflict between our matrix consolidation of
assertoor_test/assertoor_pectra_test into a single job and main's
addition of third-party container caching and BuildKit layer caching
(PR #19493). Keep our matrix strategy; adopt main's caching steps.
All groups now rely on the job-level timeout-minutes rather than
per-group go test -timeout values.
write-test-groups reads all packages from stdin (go list ./...) and
partitions them into named groups using an ordered filter pipeline.
Each group's spec is written to tools/test-groups/<name>; if a spec's
full package set is unclaimed at that point the raw spec is written,
otherwise the resolved package paths are. tools/test-groups.json is
also written with the ordered list of group names.

--json flag emits only the group name list to stdout with no Go tooling
required, used by the load-matrix workflow job.

Makefile: write-test-groups recipe; test-group reads package list from
the generated file via cat.

test-all-erigon-race: load-matrix job runs write-test-groups --json and
feeds the result into the matrix, replacing the hardcoded group list.

tools/test-groups/ and tools/test-groups.json are gitignored.
@anacrolix anacrolix requested review from Copilot and removed request for AskAlexSharov, Giulio2002, domiwei, mh0lt, mriccobene and yperbasis February 26, 2026 12:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors CI workflows to improve Go test/build caching and reduce redundant setup across jobs, while reorganizing workflows and test grouping utilities.

Changes:

  • Added composite actions for standardized Go setup and explicit restore/save caching of Go build + module caches.
  • Split and reorganized CI workflows (tests, race, sonar, reproducible build), adding matrix-based grouping and output filtering.
  • Refactored execution test utilities by moving matcher logic into execution/tests/testutil and updating tests accordingly.

Reviewed changes

Copilot reviewed 50 out of 51 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/write-test-groups New script to partition Go packages into named CI test groups and write group files/JSON.
tools/filter-test-output New bash filter to reduce noisy go test output while preserving failure exit codes.
sonar-project.properties Excludes .github/** from Sonar analysis.
node/interfaces Updates submodule pointer.
execution/tests/transaction_test.go Switches to testutil.TestMatcher API.
execution/tests/testutil/matcher.go Adds shared TestMatcher implementation and JSON test file walking/loading.
execution/tests/state_test.go Migrates to testutil.TestMatcher and changes temp dir handling.
execution/tests/rlp_test.go Switches to testutil.TestMatcher API.
execution/tests/invalid_receipt_hash_high_mgas_test.go Switches to testutil.TestMatcher API.
execution/tests/init_test.go Removes old matcher/JSON helpers; uses testutil.TestMatcher.
execution/tests/eest_prague_calldata/block_test.go Adds isolated EEST Prague calldata blockchain test package.
execution/tests/eest_osaka_clz/block_test.go Adds isolated EEST Osaka CLZ blockchain test package.
execution/tests/eest_frontier_scenarios/block_test.go Adds isolated EEST Frontier scenarios blockchain test package.
execution/tests/eest_cancun_blobs/block_test.go Adds isolated EEST Cancun blobs blockchain test package.
execution/tests/difficulty_test.go Switches to testutil.TestMatcher API.
execution/tests/block_test.go Switches to testutil.TestMatcher and adjusts skip/slow handling.
execution/tests/benchmark_engine_x_test.go Switches to testutil.TestMatcher and updates parallel control.
cl/spectest/Makefile Adjusts go test invocation flags for consensus spec tests.
agents.md Adds workflow guidance note.
TODO Adds CI/workflow-related TODOs.
Makefile Exports CGO_CXXFLAGS, refactors test filtering, adds test-group helpers, and updates lint targets.
.gitignore Ignores generated test group files.
.github/workflows/test-win-downloader.yml Removes unused workflow.
.github/workflows/test-kurtosis-assertoor.yml Collapses regular/pectra jobs into a matrix; adds skip label logic.
.github/workflows/test-integration-caplin.yml Uses setup/cleanup composite actions and filters output.
.github/workflows/test-hive.yml Adds skip label gate for “uncaching” workflows.
.github/workflows/test-all-erigon.yml Reworks test workflow to use setup/cleanup composite actions and matrix.
.github/workflows/test-all-erigon-race.yml Builds dynamic test-group matrix using write-test-groups; uses composite caching.
.github/workflows/sonar.yml New dedicated Sonar workflow using composite setup/cleanup.
.github/workflows/reusable-release-build-debian-pkg.yml Makes apt output quieter.
.github/workflows/reproducible-build.yml New reproducible build workflow across OS matrix.
.github/workflows/qa-rpc-performance-tests.yml Uses partial clone filter (blob:none).
.github/workflows/qa-rpc-performance-comparison-tests.yml Uses partial clone filter (blob:none).
.github/workflows/manifest.yml Derives Go version from go.mod.
.github/workflows/lint.yml Reworks triggers, caching, and uses composite setup/cleanup.
.github/workflows/docker-tags.yml Uses partial clone filter (blob:none).
.github/workflows/ci.yml Removes monolithic legacy CI workflow.
.github/test-caching Adds helper script to rerun workflows to warm caches on PRs.
.github/actions/setup-erigon/action.yml New composite action: checkout strategy, mtime normalization, Go setup, mod/build cache restore.
.github/actions/save-mod-cache/action.yml New composite action to hash and save Go module download cache content-addressably.
.github/actions/save-build-cache/action.yml New composite action to save Go build cache with workflow/job/scoped keys.
.github/actions/restore-build-cache/action.yml New composite action to restore Go build cache with tiered fallbacks.
.github/actions/cleanup-erigon/action.yml New composite action to save caches and warn about uncached tests.
.github/README.md Adds documentation for required checks, merge queues, triggers, and caching strategy.
.github/PROPOSAL.md Adds CI/workflow proposal notes.
.gitattributes Forces LF for go.sum to stabilize hashFiles('go.sum') cache keys across OSes.
.claude/settings.json Expands allowed tools/actions for Claude config.
.claude/rules/lint.md Adds lint rules doc.
.claude/rules/conventions.md Adds commit message conventions doc.
.claude/rules/build-and-test.md Adds build/test instructions doc.
.claude/rules/architecture.md Adds architecture overview doc.
Comments suppressed due to low confidence (6)

execution/tests/testutil/matcher.go:1

  • readJSONFile passes &value to json.Unmarshal, but callers already pass a pointer (e.g. m.Addr().Interface()). Unmarshaling into *interface{} will populate the interface variable rather than the pointed-to map, leaving the map empty. Pass value directly to json.Unmarshal so it fills the provided pointer target.
    execution/tests/testutil/matcher.go:1
  • CheckFailureWithName ignores its name parameter and matches patterns against t.Name() instead. This breaks callers that want to evaluate expected failures for a different logical test name (e.g., aggregated names or file/key names). Use name for the match instead of t.Name().
    execution/tests/testutil/matcher.go:1
  • If os.Stat(dir) returns an error other than IsNotExist, dirinfo will be nil and dirinfo.IsDir() will panic. Handle err != nil first (skip only on not-exist; otherwise fail the test), and only call IsDir() when err == nil.
    tools/filter-test-output:1
  • The script only checks PIPESTATUS[1] (the second grep) and ignores failures from the first grep; it also documents SIGPIPE (141) but currently exits 141, which can fail callers that intentionally truncate output. Consider: (1) validating both grep exit codes and returning 2 if either grep errors, and (2) treating 141 like non-fatal for a filter (similar to exit 1 handling).
    cl/spectest/Makefile:1
  • These targets previously passed -tags=spectest but no longer do. If any spec tests are gated behind //go:build spectest (or similar), this will silently skip compiling/running them. Either restore -tags=spectest here or ensure the tag is provided via GOFLAGS/environment consistently.
    agents.md:1
  • Use the standard hyphenation 'cross-platform' in documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if: ${{ inputs.submodules == 'true' }}
shell: bash
run: |
hash=$(git ls-tree -r HEAD | awk '$2 == "commit" {print $3}' | sort | sha256sum | cut -d' ' -f1)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha256sum is not available by default on macOS runners (they typically provide shasum -a 256). Since this action is used by macOS workflows, this step can fail. Use a portable hashing fallback (e.g., prefer sha256sum else shasum -a 256) consistently anywhere hashing is done in composite actions.

Suggested change
hash=$(git ls-tree -r HEAD | awk '$2 == "commit" {print $3}' | sort | sha256sum | cut -d' ' -f1)
if command -v sha256sum >/dev/null 2>&1; then
hash_cmd="sha256sum"
else
hash_cmd="shasum -a 256"
fi
hash=$(git ls-tree -r HEAD | awk '$2 == "commit" {print $3}' | sort | ${hash_cmd} | cut -d' ' -f1)

Copilot uses AI. Check for mistakes.
if: ${{ always() }}
shell: bash
run: |
hash=$(cd '${{ inputs.modcache-path }}/cache/download' && find . -type d -name '*@*' | sort | sha256sum | cut -d' ' -f1)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same portability issue as elsewhere: sha256sum is not guaranteed on macOS/Windows bash environments. Since this cache action may run on macOS jobs, replace sha256sum usage with a portable hash helper (fallback to shasum -a 256).

Suggested change
hash=$(cd '${{ inputs.modcache-path }}/cache/download' && find . -type d -name '*@*' | sort | sha256sum | cut -d' ' -f1)
portable_sha256() {
if command -v sha256sum >/dev/null 2>&1; then
sha256sum "$@"
elif command -v shasum >/dev/null 2>&1; then
shasum -a 256 "$@"
else
echo "Error: no SHA-256 checksum utility found (sha256sum or shasum)" >&2
return 1
fi
}
hash=$(cd '${{ inputs.modcache-path }}/cache/download' && find . -type d -name '*@*' | sort | portable_sha256 | cut -d' ' -f1)

Copilot uses AI. Check for mistakes.
- name: Install git-restore-mtime
shell: bash
run: |
curl -sL https://raw.githubusercontent.com/MestreLion/git-tools/main/git-restore-mtime -o "$RUNNER_TEMP/git-restore-mtime"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads and executes a script directly from raw.githubusercontent.com without pinning to a commit SHA or verifying a checksum. To reduce supply-chain risk, pin the URL to an immutable commit (or vendor the script into the repo) and/or verify an expected checksum before executing.

Suggested change
curl -sL https://raw.githubusercontent.com/MestreLion/git-tools/main/git-restore-mtime -o "$RUNNER_TEMP/git-restore-mtime"
# Pin git-restore-mtime to a specific commit to avoid running mutable code from "main"
GIT_RESTORE_MTIME_URL="https://raw.githubusercontent.com/MestreLion/git-tools/8dfb1b78c57d20593d3536bc3a1e9bdfb1ef82b4/git-restore-mtime"
curl -sL "$GIT_RESTORE_MTIME_URL" -o "$RUNNER_TEMP/git-restore-mtime"

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +3
- fixes teh issue of rerunning tests after PR is merged
- runs tests as tho they were on the merge target, and become the new commit
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc has a few typos ('teh', 'tho', 'accomodate') that reduce readability/searchability. Please correct them.

Copilot uses AI. Check for mistakes.
- caching for fixtures is per-package, so breaking up big packages for different fixtures fixes this

timeouts:
- aim for individual job limits of 30 minutes (cold). real target is 10 mins average. this is not to catch overuse of runners, but to ensure tests evolve to be parallelizable as we go (PRs that make them bigger should improve the workflow to accomodate if required).
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc has a few typos ('teh', 'tho', 'accomodate') that reduce readability/searchability. Please correct them.

Copilot uses AI. Check for mistakes.
Colon-space in 'matched key: ...' inside a plain scalar run: value is
invalid YAML. Switch to a block scalar.
Replace the complex branch/SHA/parent-commit restore-key logic with a
simple run_id|run_attempt suffix. Each save key is now inherently unique,
making the restore strategy straightforward (same run previous attempt,
then any prior run with matching workflow/job/extra-key/goversion/os/arch).

The primary key is computed once in restore-build-cache and exported as
ERIGON_CI_GOCACHE_KEY, eliminating key format duplication between restore
and save. save-build-cache now takes a single `key` input instead of
reconstructing the key from component parts.

Also renames ERIGON_CI_GOCACHE -> ERIGON_CI_GOCACHE_PATH for clarity.
Caching now works properly, so the dorny/paths-filter gate that skipped
tests for dashboard-only changes is no longer needed. Also removes two
dead env var exports (ERIGON_CI_GOVERSION, ERIGON_CI_WORKFLOW) left
over from the previous cache key simplification.
The rules files were duplicates of content already in agents.md.
Remove them to keep a single source of truth.
- gocache → gocache-path
- workflow → workflow-id
- extra-key → matrix-key
- cache-matched-key → restored-key
- cache-key → primary-key
Key format: gomodcache|workflow-id|job|matrix-key|runner.os|go.sum-hash|run_id|run_attempt

- Add restore-mod-cache action (parallel to restore-build-cache)
- Simplify save-mod-cache to match save-build-cache (key-based, no content hashing)
- setup-erigon: remove modcache input and go mod download step; restore in setup, save in cleanup
- cleanup-erigon: always save modcache via save-mod-cache using ERIGON_CI_MODCACHE_KEY
- lint: remove modcache input (no longer exists)
- Remove tools/test-groups.json from .gitignore and commit it
- Add check-test-groups Makefile target; wire into lint and lintci
- load-matrix job now cats the committed file instead of invoking the tool
Track the per-group package files in tools/test-groups/ alongside the
already-committed tools/test-groups.json. Consolidate mod_tidy_check.sh
and check-test-groups into a single check-generated target that runs
both go mod tidy and write-test-groups, then verifies no diff.
@anacrolix
Copy link
Contributor Author

There's about 3 separate PRs here, but I have most CI jobs down to about 3-4 mins. It will be broken up and submitted in separate PRs.

sudeepdino008 pushed a commit that referenced this pull request Mar 4, 2026
## Summary
Resubmission of #19109 (which was reverted).

- Bump minimum Go version from 1.24 to 1.25 across go.mod, CI workflows,
Dockerfiles, Makefile, and documentation
- Remove dead compatibility code for Go versions prior to 1.25: synctest
go1.24 shim, node/debug trace fallback (go<1.5), blake2b SSE4-only
fallback (go<1.7)
- Strip now-redundant `//go:build` version constraints
- Derive hive CI builder Go version from go.mod

## Dependencies
Depends on #19223. That PR makes CI workflows derive their Go version
from `go.mod` rather than hardcoding it. Without it, merging this PR
would leave several workflows pinned to `go-version: '1.24'` or
`'1.25'`, which would conflict with the `go.mod` bump to 1.25 here and
cause CI failures.

---------

Co-authored-by: Alexey Sharov <AskAlexSharov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-uncaching Skip Docker-based test workflows (Hive, Kurtosis) on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve time of CI jobs and make more checks mandatory for PR merge

2 participants