Skip to content

Commit 5e6ea3c

Browse files
authored
tools, .github/workflows: harden EEST fixture download and warm its cache out-of-band (erigontech#21674)
## Problem The `eest-spec-*` checks failed on a recent run ([example](https://github.com/erigontech/erigon/actions/runs/27119504237)) with: ``` curl: (22) The requested URL returned error: 504 ← x4 make: *** [Makefile:271: test-fixtures-eest] Error 22 ``` Two distinct issues surfaced: 1. **Transient download failures.** On a cache miss, `tools/test-fixtures.sh` downloads fixture tarballs directly from the GitHub release CDN. It retried only `--retry 3` over a fixed ~6s window, which couldn't ride out a burst of HTTP 504s. 2. **Cache-entry churn + cold misses.** `test-eest-spec.yml` used `actions/cache` (restore **and** save), so every PR ref (`refs/pull/<N>/merge`) saved its own ~1.6 GB copy of identical fixture tarballs. Sibling PR refs can't read each other's caches, so this gave no shared-hit benefit and churned the shared cache budget via LRU. There was also no default-branch (`main`) entry, so every PR cold-missed in the first place. ## Fix - **`tools/test-fixtures.sh`** — curl exponential backoff over a 5-minute window, fail-fast connect, and DNS re-resolution each attempt (CDN records can rotate on short TTLs): ``` --retry 10 --retry-all-errors --retry-delay 0 --retry-max-time 300 --connect-timeout 30 --dns-cache-timeout 0 ``` - **`test-eest-spec.yml`** — switch to `actions/cache/restore` (**restore-only**). PR / merge_group runs never save, so no per-ref duplicates. A miss self-heals via `tools/test-fixtures.sh`. - **`cache-warming-eest-fixtures.yml` (new)** — a dedicated workflow that populates the base-branch cache, since default-branch caches are readable from every PR. It runs on `test-fixtures.json` change + every 2 days (cron) + `workflow_dispatch`, probes with `lookup-only`, and downloads + saves only on a real miss, fetching all four tarballs so the entry is complete. Design notes: key kept as `hashFiles('test-fixtures.json')`; `cl_mainnet` intentionally excluded (owned by `test-integration-caplin.yml`); default cache compression kept — restore-only moves compression off the PR critical path into the infrequent warmer (`actions/cache` exposes no public switch to disable it — see actions/toolkit#544). ## Bootstrap note After merge the warmer won't auto-fire (it doesn't change `test-fixtures.json`), so until the next 2-day cron tick PRs would cold-miss (self-healing, just slower). Run the **`Cache Warming — EEST fixtures`** workflow once via `workflow_dispatch` right after merge to populate the `main`-scoped cache immediately. ## Verification - `actionlint` (bundles shellcheck) — clean on both workflows. - `bash -n tools/test-fixtures.sh` — clean. - YAML parses.
1 parent 32e1afe commit 5e6ea3c

3 files changed

Lines changed: 91 additions & 7 deletions

File tree

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
name: Cache Warming — EEST fixtures
2+
3+
# Populates the base-branch EEST fixture-tarball cache that test-eest-spec.yml
4+
# restores (restore-only there, so PR / merge_group runs never save their own
5+
# per-ref duplicates). GitHub cache scoping makes default-branch caches readable
6+
# from every PR and merge group, so warming on main/release is enough.
7+
#
8+
# Triggers:
9+
# - push to a protected branch touching test-fixtures.json -> warm on a version bump
10+
# - schedule (daily) -> safety net if LRU-evicted
11+
# - workflow_dispatch -> manual (use once to bootstrap)
12+
#
13+
# The key matches test-eest-spec.yml exactly:
14+
# test-fixtures-eest-${{ runner.os }}-${{ hashFiles('test-fixtures.json') }}
15+
16+
on:
17+
push:
18+
branches:
19+
- main
20+
- 'release/**'
21+
paths:
22+
- test-fixtures.json
23+
schedule:
24+
- cron: '0 4 * * *'
25+
workflow_dispatch:
26+
27+
defaults:
28+
run:
29+
shell: bash
30+
31+
permissions:
32+
contents: read
33+
34+
concurrency:
35+
group: cache-warming-eest-fixtures-${{ github.ref }}
36+
cancel-in-progress: false
37+
38+
jobs:
39+
warm:
40+
runs-on: ubuntu-24.04
41+
timeout-minutes: 30
42+
steps:
43+
- uses: actions/checkout@v6
44+
with:
45+
fetch-depth: 1
46+
47+
# Probe for an exact-key hit without downloading the archive. cache-hit is
48+
# true only on the exact primary key, so a version bump (new key) reports a
49+
# miss even when an older entry still matches the restore-key prefix.
50+
- name: Probe EEST cache
51+
id: probe
52+
uses: actions/cache/restore@v5
53+
with:
54+
path: |
55+
test-fixtures-cache/eest_stable.tar.gz
56+
test-fixtures-cache/eest_devnet.tar.gz
57+
test-fixtures-cache/eest_benchmark.tar.gz
58+
test-fixtures-cache/eest_zkevm.tar.gz
59+
key: test-fixtures-eest-${{ runner.os }}-${{ hashFiles('test-fixtures.json') }}
60+
lookup-only: true
61+
62+
# Download all four tarballs in one invocation so the single cached entry
63+
# is complete (a shard run only fetches the subset it needs).
64+
- name: Download EEST tarballs
65+
if: steps.probe.outputs.cache-hit != 'true'
66+
run: bash tools/test-fixtures.sh test-fixtures.json test-fixtures-cache eest_stable eest_devnet eest_benchmark eest_zkevm
67+
68+
- name: Save EEST cache
69+
if: steps.probe.outputs.cache-hit != 'true'
70+
uses: actions/cache/save@v5
71+
with:
72+
path: |
73+
test-fixtures-cache/eest_stable.tar.gz
74+
test-fixtures-cache/eest_devnet.tar.gz
75+
test-fixtures-cache/eest_benchmark.tar.gz
76+
test-fixtures-cache/eest_zkevm.tar.gz
77+
key: test-fixtures-eest-${{ runner.os }}-${{ hashFiles('test-fixtures.json') }}

.github/workflows/test-eest-spec.yml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,17 @@ jobs:
5858
ramdisk: true
5959
build-cache-extra-key: eest-spec-${{ matrix.shard }}
6060

61-
- name: Cache EEST tarballs
62-
uses: actions/cache@v5
61+
# Restore-only: PR / merge_group runs never SAVE, so they don't mint a
62+
# per-ref ~1.6GB duplicate every run. The base-branch (main) entry is
63+
# populated by cache-warming-eest-fixtures.yml; default-branch caches are
64+
# readable from every PR. A miss here self-heals (tools/test-fixtures.sh
65+
# re-downloads), and the warmer repopulates main.
66+
- name: Restore EEST tarballs
67+
uses: actions/cache/restore@v5
6368
with:
64-
# Only cache the .tar.gz files; extracted dirs are recreated by
65-
# tools/test-fixtures.sh on each run from the cached tarballs.
66-
# Dedicated EEST cache so this workflow doesn't compete with
67-
# test-integration-caplin.yml's cl_mainnet cache under one key.
69+
# Only the .tar.gz files are cached; tools/test-fixtures.sh re-extracts
70+
# them each run. cl_mainnet is intentionally excluded (owned by
71+
# test-integration-caplin.yml's separate cache).
6872
path: |
6973
test-fixtures-cache/eest_stable.tar.gz
7074
test-fixtures-cache/eest_devnet.tar.gz

tools/test-fixtures.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,10 @@ while IFS=$'\t' read -r name url want; do
8080

8181
if [[ ! -f "$tar_path" ]] || [[ "$(sha256 "$tar_path")" != "$want" ]]; then
8282
echo "$name: downloading from $url"
83-
curl -fsSL --retry 3 --retry-all-errors --retry-delay 2 -o "$tar_path.tmp" "$url"
83+
# Ride out transient upstream 5xx (release CDN) with exponential backoff over a 5-min window.
84+
curl -fsSL --retry 10 --retry-all-errors --retry-delay 0 \
85+
--retry-max-time 300 --connect-timeout 30 \
86+
-o "$tar_path.tmp" "$url"
8487
got=$(sha256 "$tar_path.tmp")
8588
if [[ "$got" != "$want" ]]; then
8689
rm -f "$tar_path.tmp"

0 commit comments

Comments
 (0)