Skip to content

CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files#20824

Open
lupin012 wants to merge 11 commits intomainfrom
lupin012/clear_github_cache_for_rpc_tests
Open

CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files#20824
lupin012 wants to merge 11 commits intomainfrom
lupin012/clear_github_cache_for_rpc_tests

Conversation

@lupin012
Copy link
Copy Markdown
Contributor

@lupin012 lupin012 commented Apr 26, 2026

Problem 1 — RPC_VERSION hardcoded in multiple places

The rpc-tests version was duplicated across multiple workflow YAMLs and shell scripts. A version bump required touching many files with risk of inconsistencies (e.g. qa-rpc-integration-tests-remote.yml had v1.121.0 in the cache key but the script used v2.8.1).

Fix: introduce rpc_version.env (a single file containing RPC_VERSION=v2.8.1). Wrapper scripts source it automatically so they always use the correct version whether run manually or from CI. Workflows read it in a dedicated step and write to $GITHUB_ENV for use in the cache key. Updating the version is now a one-line change in one file.

Problem 2 — Stale test files polluting the cache

Self-hosted runners keep the rpc-tests directory on disk between runs. When actions/cache restores a tagged version (e.g. v2.8.1) on top of a directory that previously held a different branch, tar overwrites matching files but leaves extra untracked files from the old branch. Git correctly reports the tag (v2.8.1) but the stale test files are still on disk and picked up by the test runner, causing spurious diff mismatch failures on tests that don't belong to that version.

Fix: add git clean -fd -e .venv -e build on cache hit — removes stale untracked test fixtures while preserving .venv/ and build/bin/rpc_int which are the expensive parts of the cache. The -x flag is intentionally omitted to avoid wiping ignored files that the rest of the script relies on.

…version

Replace the hardcoded version string in the rpc-tests cache keys with
hashFiles() so the cache is automatically invalidated whenever the test
script changes, eliminating the need to update the version in two places.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 marked this pull request as ready for review April 26, 2026 14:28
…e artifacts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major

  1. Inconsistent rollout — two workflows still hardcode the version. The same pattern still exists, untouched, in:
  • .github/workflows/qa-rpc-integration-tests-clients.yml:56 → key: rpc-tests-…-v2.4.0
  • .github/workflows/qa-sync-from-scratch.yml:146 → key: rpc-tests-…-v2.4.0

The latter is already drifted: it pins v2.4.0 but invokes run_rpc_tests_local.sh → run_rpc_tests_ethereum.sh, which uses v2.8.1. That's a pre-existing bug, but since the PR's goal is
to eliminate exactly this class of drift, it would be worth applying the same treatment here (and to qa-rpc-integration-tests-clients.yml, hashing both run_rpc_tests_geth.sh and
run_rpc_tests_nethermind.sh).

Minor
2. Over-invalidation — cache will churn on unrelated edits. Each run_rpc_tests_.sh contains both the version pin and the DISABLED_TEST_LIST. Adding/removing a single disabled
test (a frequent change, see e.g. the Ethereum script's many # Temporary disable entries) will now invalidate the cache and force a full re-clone + pip install + make rpc_int. The
cached artifact (rpc-tests checkout, .venv, rpc_int binary) doesn't depend on the disabled list at all.

Cleaner alternative: factor the version into its own file, e.g. .github/workflows/scripts/rpc_tests_version_ethereum.txt, source it in the script, and
hashFiles(...version_ethereum.txt) in the workflow. Then only true version bumps invalidate.

  1. New rm -rf step duplicates logic already in run_rpc_tests.sh. run_rpc_tests.sh:65-70 already does:
    if [ -d "$WORKSPACE/rpc-tests/.git" ] && [ "$(git -C "$WORKSPACE/rpc-tests" describe --tags --exact-match 2>/dev/null)" = "$RPC_VERSION" ]; then
    echo "Using cached rpc-tests at $RPC_VERSION"
    else
    rm -rf "$WORKSPACE/rpc-tests" >/dev/null 2>&1
    git -c advice.detachedHead=false clone --depth 1 --branch "$RPC_VERSION" …
    fi
    So a stale leftover at the wrong version would already be wiped and re-cloned by the script. Not harmful to also do it at the workflow layer, but the responsibility is now duplicated.
    Pick one home for it.

  2. Hash misses changes to the shared run_rpc_tests.sh. All three updated wrappers ultimately call run_rpc_tests.sh, which controls the venv layout, the VENV_MARKER format, and the
    make rpc_int build. If that file is edited (e.g. the venv-marker pattern changes), the cached .venv could become incompatible without invalidating the key. Recommend including both
    files in the hash:
    key: rpc-tests-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('.github/workflows/scripts/run_rpc_tests_ethereum.sh', '.github/workflows/scripts/run_rpc_tests.sh') }}
    (and the same for the gnosis / remote keys).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the GitHub Actions cache keys for rpc-tests to be derived from the hash of the corresponding test-runner script, so cache invalidation happens automatically when the script changes (removing the need to manually bump hardcoded versions).

Changes:

  • Replace hardcoded rpc-tests cache key version strings with ${{ hashFiles(...) }} based keys.
  • Add a cache-step id and clear ${{ runner.workspace }}/rpc-tests on cache misses to avoid stale content on self-hosted runners.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
.github/workflows/qa-rpc-integration-tests.yml Cache key now derives from run_rpc_tests_ethereum.sh hash; clears stale cache dir on miss.
.github/workflows/qa-rpc-integration-tests-remote.yml Cache key now derives from run_rpc_tests_remote_ethereum.sh hash; clears stale cache dir on miss.
.github/workflows/qa-rpc-integration-tests-gnosis.yml Cache key now derives from run_rpc_tests_gnosis.sh hash; clears stale cache dir on miss.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…on pins

Define RPC_VERSION once per workflow in the env section so version bumps
require a single-line change. Cache keys derive from env.RPC_VERSION for
human-readable, predictable names. Wrapper scripts pass $RPC_VERSION to
run_rpc_tests.sh; validation is centralised there with a clear error
message. Remove the duplicate stale-cache cleanup step from workflows
since run_rpc_tests.sh already handles it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012
Copy link
Copy Markdown
Contributor Author

@yperbasis I have changed the approach using the RPC_VERSION env variable. Please check if it is ok.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/qa-rpc-integration-tests-gnosis.yml
Comment thread .github/workflows/scripts/run_rpc_tests_ethereum.sh
Comment thread .github/workflows/scripts/run_rpc_tests_nethermind.sh
Comment thread .github/workflows/scripts/run_rpc_tests_ethereum_latest.sh
Comment thread .github/workflows/qa-sync-from-scratch.yml
Comment thread .github/workflows/scripts/run_rpc_tests_gnosis.sh
Comment thread .github/workflows/scripts/run_rpc_tests_geth.sh
Comment thread .github/workflows/qa-rpc-integration-tests.yml
Comment on lines +37 to +39
RPC_PAST_TEST_DIR: /opt/rpc-past-tests
CHAIN: mainnet
RPC_VERSION: v2.8.1
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow introduces RPC_VERSION: v2.8.1 as a hardcoded value, which still requires manual bumps to invalidate caches. Given the PR intent to auto-invalidate via script hash, consider removing the need for this hardcoded value by basing the cache key on hashFiles(...) (and only keeping RPC_VERSION if it is independently needed to select the rpc-tests ref).

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/qa-rpc-integration-tests-remote.yml
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz update PR title and description to match the new approach

@lupin012 lupin012 changed the title CI: derive rpc-tests cache key from script hash instead of hardcoded version CI: centralise RPC_VERSION in workflow env and fix rpc-tests cache stale files Apr 26, 2026
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major

  1. git clean -fdx largely defeats the cache it's running inside

In run_rpc_tests.sh:69-71:
if [ -d "$WORKSPACE/rpc-tests/.git" ] && [ "$(git -C "$WORKSPACE/rpc-tests" describe --tags --exact-match 2>/dev/null)" = "$RPC_VERSION" ]; then
echo "Using cached rpc-tests at $RPC_VERSION"
git -C "$WORKSPACE/rpc-tests" clean -fdx >/dev/null 2>&1
On a cache hit, actions/cache restores $WORKSPACE/rpc-tests including the cached .venv/ and build/bin/rpc_int. git clean -fdx then removes all untracked + ignored files, which means
both .venv/ and build/ are wiped. Downstream:

  • The VENV_MARKER check (run_rpc_tests.sh:79-80) fails because .venv/bin/activate is gone → full python3 -m venv .venv + pip install -r requirements.txt.
  • make rpc_int (line 121) rebuilds because the binary is gone.

So a "cache hit" only saves the ~10–30s git clone; the rest of the cached payload (the expensive parts) is thrown away every run. The cache effectively went from saving ~2 min to ~30
s.

Suggestion — pick one:

  • git clean -fd -e .venv -e build (exclude the directories the rest of the script intentionally caches), or
  • git restore . (reset only tracked files — this is what "clean stale test files" should mean if the concern is mutated fixtures), or
  • Just delete the specific known-mutated path (e.g. integration/$CHAIN/results/, which is already handled by rm -rf ./"$CHAIN"/results/* later anyway).

A one-line comment explaining which stale files motivated this would also help future readers.

  1. PR description is stale (already requested by reviewer)

Title was updated to reflect the new approach; the body still says:

▎ Replace the hardcoded version string in the rpc-tests cache keys with hashFiles() so the cache is automatically invalidated whenever the test script changes…

hashFiles() is no longer used. Please update the description to describe the env-centralisation approach (and the git clean change, once tweaked per #1).


Minor

  1. Misleading error message in run_rpc_tests.sh

run_rpc_tests.sh:21-23:
if [ -z "$2" ]; then
echo "Error: RPC_VERSION is not set — export RPC_VERSION= before running (e.g. vX.Y.Z)"
exit 1
fi
The check is on $2 (positional arg), but the message says "export RPC_VERSION". A user invoking the script directly will be sent down the wrong path — the script doesn't read from the
env at all, only from $2. Either:

  • Restore the original combined [ -z "$1" ] || [ -z "$2" ] check that just shows the usage block (simplest), or
  • Change the message to: "Error: RPC_VERSION ($2) is required (e.g. vX.Y.Z). When invoked via wrapper scripts, set the RPC_VERSION env var."
  1. Validation only in run_rpc_tests.sh, not in wrappers

The Copilot bot suggested guards in each wrapper. I think your single-point-of-validation choice is fine — but the wrappers reference an undeclared $RPC_VERSION and pass it through
unquoted-checked. If someone runs e.g. run_rpc_tests_geth.sh /tmp /tmp locally without export RPC_VERSION=…, the wrapper passes an empty string to run_rpc_tests.sh, which is then
caught — fine, but the failure point is one level deep. Consider one of:

  • A set -u (or explicit : "${RPC_VERSION:?must be set}") at the top of each wrapper, so the error surfaces close to where the variable is consumed.
  • Or document the env requirement in a comment header in each wrapper.

Not a blocker — your call.

  1. Cache key wedge risk on run_rpc_tests.sh edits

A version-only cache key won't bust on changes to the venv-marker pattern in run_rpc_tests.sh (it could leave a cached .venv whose marker name no longer matches the new pattern).
Today this is masked by issue #1 (the git clean -fdx wipes .venv regardless). Once #1 is fixed, the marker mismatch would re-trigger venv install anyway, so this self-corrects.
Mentioning it as something to keep in mind, not a blocker.

  1. Pre-existing: qa-rpc-integration-tests-clients.yml pins v2.4.0

The clients (geth, nethermind) workflow centralises RPC_VERSION: v2.4.0, while the Erigon-self workflows have moved to v2.8.1. This matches the previously hardcoded values in the
geth/nethermind wrappers, so the PR isn't introducing new drift — but worth a follow-up to align (separate PR, since it could surface real test divergence).

lupin012 and others added 3 commits April 27, 2026 21:09
…res on cache hit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… it from file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 force-pushed the lupin012/clear_github_cache_for_rpc_tests branch from d241364 to 8b4d334 Compare April 27, 2026 19:30
… wrapper scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012
Copy link
Copy Markdown
Contributor Author

@yperbasis
I opted for rpc_version.env rather than defining RPC_VERSION in the workflow env: blocks for two reasons:

  1. Local usability — run_rpc_tests_local.sh is designed to be run directly by developers without going through CI. With the version defined only in the YAMLs, a developer would need to open the workflow file to find the correct version and either export it manually or pass it as an argument before running the script. With rpc_version.env, the scripts source it automatically — the developer doesn't need to know or care about versioning.

  2. Single source of truth — the version was previously scattered across 4+ workflow YAMLs, making drift trivially easy

@lupin012 lupin012 changed the title CI: centralise RPC_VERSION in workflow env and fix rpc-tests cache stale files CI: centralise RPC_VERSION in file and fix rpc-tests cache stale files Apr 27, 2026
@lupin012 lupin012 requested a review from yperbasis April 27, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants