CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files#20824
CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files#20824
Conversation
…version Replace the hardcoded version string in the rpc-tests cache keys with hashFiles() so the cache is automatically invalidated whenever the test script changes, eliminating the need to update the version in two places. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e artifacts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yperbasis
left a comment
There was a problem hiding this comment.
Major
- Inconsistent rollout — two workflows still hardcode the version. The same pattern still exists, untouched, in:
- .github/workflows/qa-rpc-integration-tests-clients.yml:56 → key: rpc-tests-…-v2.4.0
- .github/workflows/qa-sync-from-scratch.yml:146 → key: rpc-tests-…-v2.4.0
The latter is already drifted: it pins v2.4.0 but invokes run_rpc_tests_local.sh → run_rpc_tests_ethereum.sh, which uses v2.8.1. That's a pre-existing bug, but since the PR's goal is
to eliminate exactly this class of drift, it would be worth applying the same treatment here (and to qa-rpc-integration-tests-clients.yml, hashing both run_rpc_tests_geth.sh and
run_rpc_tests_nethermind.sh).
Minor
2. Over-invalidation — cache will churn on unrelated edits. Each run_rpc_tests_.sh contains both the version pin and the DISABLED_TEST_LIST. Adding/removing a single disabled
test (a frequent change, see e.g. the Ethereum script's many # Temporary disable entries) will now invalidate the cache and force a full re-clone + pip install + make rpc_int. The
cached artifact (rpc-tests checkout, .venv, rpc_int binary) doesn't depend on the disabled list at all.
Cleaner alternative: factor the version into its own file, e.g. .github/workflows/scripts/rpc_tests_version_ethereum.txt, source it in the script, and
hashFiles(...version_ethereum.txt) in the workflow. Then only true version bumps invalidate.
-
New rm -rf step duplicates logic already in run_rpc_tests.sh. run_rpc_tests.sh:65-70 already does:
if [ -d "$WORKSPACE/rpc-tests/.git" ] && [ "$(git -C "$WORKSPACE/rpc-tests" describe --tags --exact-match 2>/dev/null)" = "$RPC_VERSION" ]; then
echo "Using cached rpc-tests at $RPC_VERSION"
else
rm -rf "$WORKSPACE/rpc-tests" >/dev/null 2>&1
git -c advice.detachedHead=false clone --depth 1 --branch "$RPC_VERSION" …
fi
So a stale leftover at the wrong version would already be wiped and re-cloned by the script. Not harmful to also do it at the workflow layer, but the responsibility is now duplicated.
Pick one home for it. -
Hash misses changes to the shared run_rpc_tests.sh. All three updated wrappers ultimately call run_rpc_tests.sh, which controls the venv layout, the VENV_MARKER format, and the
make rpc_int build. If that file is edited (e.g. the venv-marker pattern changes), the cached .venv could become incompatible without invalidating the key. Recommend including both
files in the hash:
key: rpc-tests-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('.github/workflows/scripts/run_rpc_tests_ethereum.sh', '.github/workflows/scripts/run_rpc_tests.sh') }}
(and the same for the gnosis / remote keys).
There was a problem hiding this comment.
Pull request overview
This PR updates the GitHub Actions cache keys for rpc-tests to be derived from the hash of the corresponding test-runner script, so cache invalidation happens automatically when the script changes (removing the need to manually bump hardcoded versions).
Changes:
- Replace hardcoded
rpc-testscache key version strings with${{ hashFiles(...) }}based keys. - Add a cache-step
idand clear${{ runner.workspace }}/rpc-testson cache misses to avoid stale content on self-hosted runners.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
.github/workflows/qa-rpc-integration-tests.yml |
Cache key now derives from run_rpc_tests_ethereum.sh hash; clears stale cache dir on miss. |
.github/workflows/qa-rpc-integration-tests-remote.yml |
Cache key now derives from run_rpc_tests_remote_ethereum.sh hash; clears stale cache dir on miss. |
.github/workflows/qa-rpc-integration-tests-gnosis.yml |
Cache key now derives from run_rpc_tests_gnosis.sh hash; clears stale cache dir on miss. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…on pins Define RPC_VERSION once per workflow in the env section so version bumps require a single-line change. Cache keys derive from env.RPC_VERSION for human-readable, predictable names. Wrapper scripts pass $RPC_VERSION to run_rpc_tests.sh; validation is centralised there with a clear error message. Remove the duplicate stale-cache cleanup step from workflows since run_rpc_tests.sh already handles it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@yperbasis I have changed the approach using the RPC_VERSION env variable. Please check if it is ok. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RPC_PAST_TEST_DIR: /opt/rpc-past-tests | ||
| CHAIN: mainnet | ||
| RPC_VERSION: v2.8.1 |
There was a problem hiding this comment.
This workflow introduces RPC_VERSION: v2.8.1 as a hardcoded value, which still requires manual bumps to invalidate caches. Given the PR intent to auto-invalidate via script hash, consider removing the need for this hardcoded value by basing the cache key on hashFiles(...) (and only keeping RPC_VERSION if it is independently needed to select the rpc-tests ref).
yperbasis
left a comment
There was a problem hiding this comment.
Plz update PR title and description to match the new approach
…files across runs
yperbasis
left a comment
There was a problem hiding this comment.
Major
- git clean -fdx largely defeats the cache it's running inside
In run_rpc_tests.sh:69-71:
if [ -d "$WORKSPACE/rpc-tests/.git" ] && [ "$(git -C "$WORKSPACE/rpc-tests" describe --tags --exact-match 2>/dev/null)" = "$RPC_VERSION" ]; then
echo "Using cached rpc-tests at $RPC_VERSION"
git -C "$WORKSPACE/rpc-tests" clean -fdx >/dev/null 2>&1
On a cache hit, actions/cache restores $WORKSPACE/rpc-tests including the cached .venv/ and build/bin/rpc_int. git clean -fdx then removes all untracked + ignored files, which means
both .venv/ and build/ are wiped. Downstream:
- The VENV_MARKER check (run_rpc_tests.sh:79-80) fails because .venv/bin/activate is gone → full python3 -m venv .venv + pip install -r requirements.txt.
- make rpc_int (line 121) rebuilds because the binary is gone.
So a "cache hit" only saves the ~10–30s git clone; the rest of the cached payload (the expensive parts) is thrown away every run. The cache effectively went from saving ~2 min to ~30
s.
Suggestion — pick one:
- git clean -fd -e .venv -e build (exclude the directories the rest of the script intentionally caches), or
- git restore . (reset only tracked files — this is what "clean stale test files" should mean if the concern is mutated fixtures), or
- Just delete the specific known-mutated path (e.g. integration/$CHAIN/results/, which is already handled by rm -rf ./"$CHAIN"/results/* later anyway).
A one-line comment explaining which stale files motivated this would also help future readers.
- PR description is stale (already requested by reviewer)
Title was updated to reflect the new approach; the body still says:
▎ Replace the hardcoded version string in the rpc-tests cache keys with hashFiles() so the cache is automatically invalidated whenever the test script changes…
hashFiles() is no longer used. Please update the description to describe the env-centralisation approach (and the git clean change, once tweaked per #1).
Minor
- Misleading error message in run_rpc_tests.sh
run_rpc_tests.sh:21-23:
if [ -z "$2" ]; then
echo "Error: RPC_VERSION is not set — export RPC_VERSION= before running (e.g. vX.Y.Z)"
exit 1
fi
The check is on $2 (positional arg), but the message says "export RPC_VERSION". A user invoking the script directly will be sent down the wrong path — the script doesn't read from the
env at all, only from $2. Either:
- Restore the original combined [ -z "$1" ] || [ -z "$2" ] check that just shows the usage block (simplest), or
- Change the message to: "Error: RPC_VERSION ($2) is required (e.g. vX.Y.Z). When invoked via wrapper scripts, set the RPC_VERSION env var."
- Validation only in run_rpc_tests.sh, not in wrappers
The Copilot bot suggested guards in each wrapper. I think your single-point-of-validation choice is fine — but the wrappers reference an undeclared $RPC_VERSION and pass it through
unquoted-checked. If someone runs e.g. run_rpc_tests_geth.sh /tmp /tmp locally without export RPC_VERSION=…, the wrapper passes an empty string to run_rpc_tests.sh, which is then
caught — fine, but the failure point is one level deep. Consider one of:
- A set -u (or explicit : "${RPC_VERSION:?must be set}") at the top of each wrapper, so the error surfaces close to where the variable is consumed.
- Or document the env requirement in a comment header in each wrapper.
Not a blocker — your call.
- Cache key wedge risk on run_rpc_tests.sh edits
A version-only cache key won't bust on changes to the venv-marker pattern in run_rpc_tests.sh (it could leave a cached .venv whose marker name no longer matches the new pattern).
Today this is masked by issue #1 (the git clean -fdx wipes .venv regardless). Once #1 is fixed, the marker mismatch would re-trigger venv install anyway, so this self-corrects.
Mentioning it as something to keep in mind, not a blocker.
- Pre-existing: qa-rpc-integration-tests-clients.yml pins v2.4.0
The clients (geth, nethermind) workflow centralises RPC_VERSION: v2.4.0, while the Erigon-self workflows have moved to v2.8.1. This matches the previously hardcoded values in the
geth/nethermind wrappers, so the PR isn't introducing new drift — but worth a follow-up to align (separate PR, since it could surface real test divergence).
…res on cache hit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… it from file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
d241364 to
8b4d334
Compare
… wrapper scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@yperbasis
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem 1 — RPC_VERSION hardcoded in multiple places
The rpc-tests version was duplicated across multiple workflow YAMLs and shell scripts. A version bump required touching many files with risk of inconsistencies (e.g. qa-rpc-integration-tests-remote.yml had v1.121.0 in the cache key but the script used v2.8.1).
Fix: introduce rpc_version.env (a single file containing RPC_VERSION=v2.8.1). Wrapper scripts source it automatically so they always use the correct version whether run manually or from CI. Workflows read it in a dedicated step and write to $GITHUB_ENV for use in the cache key. Updating the version is now a one-line change in one file.
Problem 2 — Stale test files polluting the cache
Self-hosted runners keep the rpc-tests directory on disk between runs. When actions/cache restores a tagged version (e.g. v2.8.1) on top of a directory that previously held a different branch, tar overwrites matching files but leaves extra untracked files from the old branch. Git correctly reports the tag (v2.8.1) but the stale test files are still on disk and picked up by the test runner, causing spurious diff mismatch failures on tests that don't belong to that version.
Fix: add git clean -fd -e .venv -e build on cache hit — removes stale untracked test fixtures while preserving .venv/ and build/bin/rpc_int which are the expensive parts of the cache. The -x flag is intentionally omitted to avoid wiping ignored files that the rest of the script relies on.