You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
execution: make block read aheader non-global (#21056)
# execution: make block read aheader non-global
## Summary
Replaces the package-level `globalReadAheader` in
`execution/exec/blocks_read_ahead.go` with a per-`Ethereum` instance.
Each backend now owns its own `*BlockReadAheader` (caches + warmup
`WaitGroup`) and threads it through to the consumers that need it.
## Why
The `globalReadAheader` is a process-wide singleton. The exec module's
`ValidateChain` calls `AddHeaderAndBody`, which spawns a fire-and-forget
warmup goroutine tracked on a shared `sync.WaitGroup`. The eth backend's
`Stop()` waits on that same WaitGroup before closing the chain DB. With
**multiple `Ethereum` instances in one process** (e.g. the new `evm
enginextest` runner that creates ~36 k testers in a single run, or any
other harness that spins up many backends), `Add(1)` from one tester
races with `Wait()` returning in another. Once the counter momentarily
hits zero, a concurrent `Add(1)` panics:
```
panic: sync: WaitGroup is reused before previous Wait has returned
```
This was reproducible at `--workers ≥ 16` running `evm enginextest`
against the full EEST `blockchain_tests_engine_x` set on tmpfs.
Single-Erigon-per-process production deployments never trigger it, which
is why it has been latent.
## Approach
Make the read-aheader a regular struct owned by each `Ethereum`. The
lifetime is the same as the eth backend, the WaitGroup never escapes the
backend, and the race vanishes.
## Files (13 changed, +69 / −58)
- `execution/exec/blocks_read_ahead.go` — exports `BlockReadAheader` and
`NewBlockReadAheader()`. Methods replace the package-level `*Global*`
wrappers (`AddHeaderAndBodyToGlobalReadAheader`,
`AddSendersToGlobalReadAheader`,
`ReadBodyWithTransactionsFromGlobalReadAheader`,
`ReadBlockWithSendersFromGlobalReadAheader`, `WaitForWarmup`). The
unused `WarmBodyFromGlobalReadAheader` is dropped.
- `node/eth/backend.go` — `Ethereum.readAheader` field, initialized in
`New` to `exec.NewBlockReadAheader()`, used by `Stop()` and threaded
into the staged-sync configs and `NewExecModule`.
- `execution/execmodule/exec_module.go` — `ExecModule.readAheader`
field; new arg on `NewExecModule`. `ValidateChain` now calls
`e.readAheader.AddHeaderAndBody(...)`.
- `execution/stagedsync/stage_senders.go` (+ `stage_senders_test.go`) —
`SendersCfg.readAheader`; new arg on `StageSendersCfg`; calls
`cfg.readAheader.{AddSenders, ReadBodyWithTransactions}`.
- `execution/stagedsync/stage_execute.go` —
`ExecuteBlockCfg.readAheader`; new arg on `StageExecuteBlocksCfg`.
- `execution/stagedsync/exec3.go`, `exec3_serial.go` — call
`te.cfg.readAheader.ReadBlockWithSenders(...)` /
`se.cfg.readAheader.ReadBlockWithSenders(...)`.
- `execution/stagedsync/stage_witness.go` — passes a fresh
`exec.NewBlockReadAheader()` (rewind-only path, doesn't share with the
live backend).
- `execution/stagedsync/stageloop/stageloop.go` — `NewDefaultStages` /
`NewPipelineStages` / `NewInMemoryExecution` accept `readAheader` and
plumb it into the senders + execute stage configs.
- `execution/execmodule/execmoduletester/exec_module_tester.go` —
constructs one `readAheader` and reuses it across the tester's stage
configs and `NewExecModule` call (mirroring how `Ethereum` shares a
single instance).
- `cmd/integration/commands/stages.go`, `state_stages.go` — pass fresh
`exec.NewBlockReadAheader()` instances at the integration-CLI call sites
(each invocation is its own short-lived process).
No runtime semantics change inside the read-aheader itself: same caches,
same one-warmup-at-a-time `atomic.Bool`, same `WaitForWarmup` behaviour.
The only change is that the WaitGroup is per-instance rather than
process-wide.
## Verification
- `make lint` clean.
- **`evm enginextest`** against the full EEST
`blockchain_tests_engine_x` set (`fixtures_develop` v5.4.0), `--workers
8` on `TMPDIR=/dev/shm`:
| metric | value |
|--------|-------|
| Tests passed | **63920 / 63920** |
| Wall time | 5:10 |
| Peak RSS | 5.0 GB |
No `WaitGroup` panic at any worker count tested (8, 16, 24).
- **`make test-all`** with
`ERIGON_EXECUTION_TESTS_TMPDIR=/mnt/erigon-ramdisk GOGC=80`: **222
packages green, 0 failures.** All packages this PR touches pass
directly:
- `execution/exec` (0.10s)
- `execution/execmodule` (0.83s), `execmoduletester` (0.29s)
- `execution/stagedsync` (7.18s), `stagedsync/bodydownload`,
`stagedsync/headerdownload`
- `execution/tests` (212.89s, contains the previously-flaky
`TestInvalidReceiptHashHighMgas`)
- Leak-loop tooling test (build tag `leak`, in the engine-x test runner
package) shows goroutine count flat at 3 across 15 iterations — the
read-aheader's warmup goroutine terminates cleanly per Close.
## Notes
- The integration commands (`cmd/integration/commands/stages.go`,
`state_stages.go`) and the `RewindStagesForWitness` helper construct
fresh `exec.NewBlockReadAheader()` instances rather than threading one
through. Each is a one-shot, short-lived process where the cache
wouldn't carry value across the call. If we ever want shared caching
there, those sites have a clear seam to do it.
- Production single-instance Erigon behaviour is unchanged: same caches,
same warmup semantics, same `Stop()` wait, same `dbg.ReadAhead` gate
inside `warmBody`.
0 commit comments