You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is **PR erigontech#1 of a 3-PR perf stack**. It consolidates state caching
across all modes — sync, tip-tracking, integration and testing — so the
state cache is either **on and completely trustworthy**, or **off to
measure** — never off because it unexpectedly breaks things.
The caches (the account/storage `StateCache` and the commitment
`BranchCache`) are an **internal implementation detail of
`SharedDomains`**. No external entity accesses or mutates them directly:
callers drive state through `Flush` / `Commit` / `GetLatest` /
`DomainPut`, and the full cache lifecycle (population, invalidation,
commit-gating) is owned inside `SharedDomains`.
## What this PR contains
### `BranchCache` — single aggregator-scope commitment cache
- Aggregator-lifetime cache: pinned root slot + bounded LRU tail, behind
the `sd.mem` chain so unwinds and fork-validations see consistent state.
- Wired into the trie read + encoder write paths.
- Tx-precise unwind invalidation: entries are stamped with their per-key
write `txNum`; `sd.Unwind` evicts everything above the unwind watermark
(`BranchCache.UnwindTo`).
### One switch for all caches
The `BranchCache` is a *type of* state cache, so it rides the existing
`USE_STATE_CACHE` toggle rather than getting its own env. One operator
switch turns **all** caching off — the relevant operation when bisecting
a state-root mismatch, where an operator shouldn't have to reason about
the interaction of several independent caches. (This is a deliberate
deviation from the review's "add a separate `BranchCache` kill-switch"
suggestion — flagged for confirmation.)
### The BranchCache reflects only committed state — by construction
The BranchCache can never hold a value a failed commit rolled back:
`SharedDomains.Commit` flushes the in-memory batch into the tx, commits,
and **only then** applies the flushed commitment branches to the cache
(the flush is implicit in committing; a failed commit applies nothing).
Plain `Flush` — callers that own their own commit, e.g. offline tools —
never touches the cache; read-through populates from committed files.
The caches are an internal detail of `SharedDomains`; nothing else
writes to them.
> **StateCache no-poisoning is erigontech#21386, not this PR.** Unlike the
BranchCache, the account/storage StateCache is an *in-flight,
cross-transaction* cache — it holds prior txs' not-yet-committed writes
within a batch, and later txs read them from it. So it can't be made
commit-safe by simply invalidating on write (that breaks cross-tx reads
in serial exec). Its no-poisoning is the txNum/epoch rework in erigontech#21386;
this PR keeps its existing ValidateAndPrepare/unwind invalidation.
### BUG erigontech#21138 — parallel-exec from-0 wrong trie root
`ResetExec` wipes the commitment DB table; the aggregator's in-memory
`BranchCache` could still reference the just-deleted trie nodes, so a
from-0 re-exec served stale entries when computing block 0's commitment
→ wrong root, dropping genesis-allocated balances no later block touched
(mainnet block 46147, `0xA1E4380A3B1f749673E270229993eE55F35663b4`).
Fix: `ResetExec` clears the aggregator's `BranchCache`.
`TestFromZero_GenesisAllocPreservedAfterResetReExec` passes on current
`main`; the test's value here is keeping *this PR's* cache safe across
reset, not fixing a live `main` bug.
## Follow-ups (the rest of the stack)
- **erigontech#21386 (PR erigontech#2 of the stack) — StateCache LRU + Mode rework:**
consistency + no performance drop-off at a 1 GB cache for long-running
nodes; re-adds the warm StateCache repopulation deferred above, under
its `txNum`/`epoch` model.
- **Pinning** (the stack's third step) — **no PR yet**; in progress and
under test on branch
[`mh/branch-cache-trunk-pin`](https://github.com/erigontech/erigon/tree/mh/branch-cache-trunk-pin),
to be **re-benchmarked before merge**.
- **erigontech#21739** — interface-unification follow-up: collapse the duck-typed
`GetLatest` variants into a single metered, `txNum`-returning
`GetLatest`.
## Testing
Behaves identically across parallel and serial exec — confirmed in CI
across both exec modes. Unit coverage for the BranchCache (tiers,
tx-precise `UnwindTo`, commit-gated population) plus the
engine/exec-module FCU commit paths.
Given today's changes (the commit-gating of both caches), we will do
another A/B performance run before merging.
---------
Co-authored-by: Mark Holt <erigon@dev-bm-e3-ethmainnet-n4.erigon.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments