CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files by lupin012 · Pull Request #20824 · erigontech/erigon

lupin012 · 2026-04-26T14:27:31Z

Problem 1 — RPC_VERSION hardcoded in multiple places

The rpc-tests version was duplicated across multiple workflow YAMLs and shell scripts. A version bump required touching many files with risk of inconsistencies (e.g. qa-rpc-integration-tests-remote.yml had v1.121.0 in the cache key but the script used v2.8.1).

Fix: introduce rpc_version.env (a single file containing RPC_VERSION=v2.8.1). Wrapper scripts source it automatically so they always use the correct version whether run manually or from CI. Workflows read it in a dedicated step and write to $GITHUB_ENV for use in the cache key. Updating the version is now a one-line change in one file.

Problem 2 — Stale test files polluting the cache

Self-hosted runners keep the rpc-tests directory on disk between runs. When actions/cache restores a tagged version (e.g. v2.8.1) on top of a directory that previously held a different branch, tar overwrites matching files but leaves extra untracked files from the old branch. Git correctly reports the tag (v2.8.1) but the stale test files are still on disk and picked up by the test runner, causing spurious diff mismatch failures on tests that don't belong to that version.

Fix: add git clean -fd -e .venv -e build on cache hit — removes stale untracked test fixtures while preserving .venv/ and build/bin/rpc_int which are the expensive parts of the cache. The -x flag is intentionally omitted to avoid wiping ignored files that the rest of the script relies on.

…version Replace the hardcoded version string in the rpc-tests cache keys with hashFiles() so the cache is automatically invalidated whenever the test script changes, eliminating the need to update the version in two places. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e artifacts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

yperbasis

Major

Inconsistent rollout — two workflows still hardcode the version. The same pattern still exists, untouched, in:

.github/workflows/qa-rpc-integration-tests-clients.yml:56 → key: rpc-tests-…-v2.4.0
.github/workflows/qa-sync-from-scratch.yml:146 → key: rpc-tests-…-v2.4.0

The latter is already drifted: it pins v2.4.0 but invokes run_rpc_tests_local.sh → run_rpc_tests_ethereum.sh, which uses v2.8.1. That's a pre-existing bug, but since the PR's goal is
to eliminate exactly this class of drift, it would be worth applying the same treatment here (and to qa-rpc-integration-tests-clients.yml, hashing both run_rpc_tests_geth.sh and
run_rpc_tests_nethermind.sh).

Minor
2. Over-invalidation — cache will churn on unrelated edits. Each run_rpc_tests_.sh contains both the version pin and the DISABLED_TEST_LIST. Adding/removing a single disabled
test (a frequent change, see e.g. the Ethereum script's many # Temporary disable entries) will now invalidate the cache and force a full re-clone + pip install + make rpc_int. The
cached artifact (rpc-tests checkout, .venv, rpc_int binary) doesn't depend on the disabled list at all.

Cleaner alternative: factor the version into its own file, e.g. .github/workflows/scripts/rpc_tests_version_ethereum.txt, source it in the script, and
hashFiles(...version_ethereum.txt) in the workflow. Then only true version bumps invalidate.

New rm -rf step duplicates logic already in run_rpc_tests.sh. run_rpc_tests.sh:65-70 already does:
if [ -d "$WORKSPACE/rpc-tests/.git" ] && [ "$(git -C "$WORKSPACE/rpc-tests" describe --tags --exact-match 2>/dev/null)" = "$RPC_VERSION" ]; then
echo "Using cached rpc-tests at $RPC_VERSION"
else
rm -rf "$WORKSPACE/rpc-tests" >/dev/null 2>&1
git -c advice.detachedHead=false clone --depth 1 --branch "$RPC_VERSION" …
fi
So a stale leftover at the wrong version would already be wiped and re-cloned by the script. Not harmful to also do it at the workflow layer, but the responsibility is now duplicated.
Pick one home for it.
Hash misses changes to the shared run_rpc_tests.sh. All three updated wrappers ultimately call run_rpc_tests.sh, which controls the venv layout, the VENV_MARKER format, and the
make rpc_int build. If that file is edited (e.g. the venv-marker pattern changes), the cached .venv could become incompatible without invalidating the key. Recommend including both
files in the hash:
key: rpc-tests-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('.github/workflows/scripts/run_rpc_tests_ethereum.sh', '.github/workflows/scripts/run_rpc_tests.sh') }}
(and the same for the gnosis / remote keys).

Copilot

Pull request overview

This PR updates the GitHub Actions cache keys for rpc-tests to be derived from the hash of the corresponding test-runner script, so cache invalidation happens automatically when the script changes (removing the need to manually bump hardcoded versions).

Changes:

Replace hardcoded rpc-tests cache key version strings with ${{ hashFiles(...) }} based keys.
Add a cache-step id and clear ${{ runner.workspace }}/rpc-tests on cache misses to avoid stale content on self-hosted runners.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`.github/workflows/qa-rpc-integration-tests.yml`	Cache key now derives from `run_rpc_tests_ethereum.sh` hash; clears stale cache dir on miss.
`.github/workflows/qa-rpc-integration-tests-remote.yml`	Cache key now derives from `run_rpc_tests_remote_ethereum.sh` hash; clears stale cache dir on miss.
`.github/workflows/qa-rpc-integration-tests-gnosis.yml`	Cache key now derives from `run_rpc_tests_gnosis.sh` hash; clears stale cache dir on miss.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…on pins Define RPC_VERSION once per workflow in the env section so version bumps require a single-line change. Cache keys derive from env.RPC_VERSION for human-readable, predictable names. Wrapper scripts pass $RPC_VERSION to run_rpc_tests.sh; validation is centralised there with a clear error message. Remove the duplicate stale-cache cleanup step from workflows since run_rpc_tests.sh already handles it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 · 2026-04-26T16:24:47Z

@yperbasis I have changed the approach using the RPC_VERSION env variable. Please check if it is ok.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T17:01:27Z

      RPC_PAST_TEST_DIR: /opt/rpc-past-tests
      CHAIN: mainnet
+      RPC_VERSION: v2.8.1


This workflow introduces RPC_VERSION: v2.8.1 as a hardcoded value, which still requires manual bumps to invalidate caches. Given the PR intent to auto-invalidate via script hash, consider removing the need for this hardcoded value by basing the cache key on hashFiles(...) (and only keeping RPC_VERSION if it is independently needed to select the rpc-tests ref).

yperbasis

Plz update PR title and description to match the new approach

…files across runs

yperbasis

Major

git clean -fdx largely defeats the cache it's running inside

In run_rpc_tests.sh:69-71:
if [ -d "$WORKSPACE/rpc-tests/.git" ] && [ "$(git -C "$WORKSPACE/rpc-tests" describe --tags --exact-match 2>/dev/null)" = "$RPC_VERSION" ]; then
echo "Using cached rpc-tests at $RPC_VERSION"
git -C "$WORKSPACE/rpc-tests" clean -fdx >/dev/null 2>&1
On a cache hit, actions/cache restores $WORKSPACE/rpc-tests including the cached .venv/ and build/bin/rpc_int. git clean -fdx then removes all untracked + ignored files, which means
both .venv/ and build/ are wiped. Downstream:

The VENV_MARKER check (run_rpc_tests.sh:79-80) fails because .venv/bin/activate is gone → full python3 -m venv .venv + pip install -r requirements.txt.
make rpc_int (line 121) rebuilds because the binary is gone.

So a "cache hit" only saves the ~10–30s git clone; the rest of the cached payload (the expensive parts) is thrown away every run. The cache effectively went from saving ~2 min to ~30
s.

Suggestion — pick one:

git clean -fd -e .venv -e build (exclude the directories the rest of the script intentionally caches), or
git restore . (reset only tracked files — this is what "clean stale test files" should mean if the concern is mutated fixtures), or
Just delete the specific known-mutated path (e.g. integration/$CHAIN/results/, which is already handled by rm -rf ./"$CHAIN"/results/* later anyway).

A one-line comment explaining which stale files motivated this would also help future readers.

PR description is stale (already requested by reviewer)

Title was updated to reflect the new approach; the body still says:

▎ Replace the hardcoded version string in the rpc-tests cache keys with hashFiles() so the cache is automatically invalidated whenever the test script changes…

hashFiles() is no longer used. Please update the description to describe the env-centralisation approach (and the git clean change, once tweaked per #1).

Minor

Misleading error message in run_rpc_tests.sh

run_rpc_tests.sh:21-23:
if [ -z "$2" ]; then
echo "Error: RPC_VERSION is not set — export RPC_VERSION= before running (e.g. vX.Y.Z)"
exit 1
fi
The check is on $2 (positional arg), but the message says "export RPC_VERSION". A user invoking the script directly will be sent down the wrong path — the script doesn't read from the
env at all, only from $2. Either:

Restore the original combined [ -z "$1" ] || [ -z "$2" ] check that just shows the usage block (simplest), or
Change the message to: "Error: RPC_VERSION ($2) is required (e.g. vX.Y.Z). When invoked via wrapper scripts, set the RPC_VERSION env var."

Validation only in run_rpc_tests.sh, not in wrappers

The Copilot bot suggested guards in each wrapper. I think your single-point-of-validation choice is fine — but the wrappers reference an undeclared $RPC_VERSION and pass it through
unquoted-checked. If someone runs e.g. run_rpc_tests_geth.sh /tmp /tmp locally without export RPC_VERSION=…, the wrapper passes an empty string to run_rpc_tests.sh, which is then
caught — fine, but the failure point is one level deep. Consider one of:

A set -u (or explicit : "${RPC_VERSION:?must be set}") at the top of each wrapper, so the error surfaces close to where the variable is consumed.
Or document the env requirement in a comment header in each wrapper.

Not a blocker — your call.

Cache key wedge risk on run_rpc_tests.sh edits

A version-only cache key won't bust on changes to the venv-marker pattern in run_rpc_tests.sh (it could leave a cached .venv whose marker name no longer matches the new pattern).
Today this is masked by issue #1 (the git clean -fdx wipes .venv regardless). Once #1 is fixed, the marker mismatch would re-trigger venv install anyway, so this self-corrects.
Mentioning it as something to keep in mind, not a blocker.

Pre-existing: qa-rpc-integration-tests-clients.yml pins v2.4.0

The clients (geth, nethermind) workflow centralises RPC_VERSION: v2.4.0, while the Erigon-self workflows have moved to v2.8.1. This matches the previously hardcoded values in the
geth/nethermind wrappers, so the PR isn't introducing new drift — but worth a follow-up to align (separate PR, since it could surface real test divergence).

…res on cache hit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… it from file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… wrapper scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 · 2026-04-27T19:57:18Z

@yperbasis
I opted for rpc_version.env rather than defining RPC_VERSION in the workflow env: blocks for two reasons:

Local usability — run_rpc_tests_local.sh is designed to be run directly by developers without going through CI. With the version defined only in the YAMLs, a developer would need to open the workflow file to find the correct version and either export it manually or pass it as an argument before running the script. With rpc_version.env, the scripts source it automatically — the developer doesn't need to know or care about versioning.
Single source of truth — the version was previously scattered across 4+ workflow YAMLs, making drift trivially easy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 requested review from canepat, taratorio and yperbasis April 26, 2026 14:28

lupin012 marked this pull request as ready for review April 26, 2026 14:28

lupin012 requested review from AskAlexSharov and mriccobene as code owners April 26, 2026 14:28

ci: clear stale rpc-tests dir on cache miss to avoid reusing workspac…

4317482

…e artifacts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

yperbasis added RPC QA labels Apr 26, 2026

yperbasis requested a review from Copilot April 26, 2026 15:41

Copilot started reviewing on behalf of yperbasis April 26, 2026 15:42 View session

yperbasis requested changes Apr 26, 2026

View reviewed changes

Copilot AI reviewed Apr 26, 2026

View reviewed changes

lupin012 requested a review from yperbasis April 26, 2026 16:25

yperbasis requested a review from Copilot April 26, 2026 16:58

Copilot started reviewing on behalf of yperbasis April 26, 2026 16:58 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

yperbasis requested changes Apr 26, 2026

View reviewed changes

ci: clean untracked files from rpc-tests cache to prevent stale test …

ab9094f

…files across runs

lupin012 changed the title ~~CI: derive rpc-tests cache key from script hash instead of hardcoded version~~ CI: centralise RPC_VERSION in workflow env and fix rpc-tests cache stale files Apr 26, 2026

yperbasis requested changes Apr 27, 2026

View reviewed changes

lupin012 and others added 3 commits April 27, 2026 21:09

ci: preserve .venv and build dirs when cleaning stale rpc-tests fixtu…

6369031

…res on cache hit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci: move RPC_VERSION to rpc_version.env so scripts and workflows read…

bbe8e17

… it from file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci: restore original combined check for missing CHAIN/RPC_VERSION args

8b4d334

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 force-pushed the lupin012/clear_github_cache_for_rpc_tests branch from d241364 to 8b4d334 Compare April 27, 2026 19:30

ci: align clients to rpc_version.env and add RPC_VERSION guard in all…

8a85b36

… wrapper scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 and others added 2 commits April 27, 2026 21:59

Merge branch 'main' into lupin012/clear_github_cache_for_rpc_tests

27a80e8

ci: temporarily disable debug_traceBlockByNumber test_33 and test_34

f82c624

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lupin012 changed the title ~~CI: centralise RPC_VERSION in workflow env and fix rpc-tests cache stale files~~ CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files Apr 27, 2026

lupin012 requested a review from yperbasis April 27, 2026 20:37

Merge branch 'main' into lupin012/clear_github_cache_for_rpc_tests

dd40405

yperbasis approved these changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files#20824

CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files#20824
lupin012 wants to merge 11 commits intomainfrom
lupin012/clear_github_cache_for_rpc_tests

lupin012 commented Apr 26, 2026 •

edited

Loading

Uh oh!

yperbasis left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

lupin012 commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Uh oh!

yperbasis left a comment

Uh oh!

yperbasis left a comment

Uh oh!

lupin012 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lupin012 commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

lupin012 commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

lupin012 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lupin012 commented Apr 26, 2026 •

edited

Loading