Skip to content

commitment: add persistent branch cache across Process() calls#19954

Closed
mh0lt wants to merge 36 commits intomainfrom
commitment/persistent-branch-cache
Closed

commitment: add persistent branch cache across Process() calls#19954
mh0lt wants to merge 36 commits intomainfrom
commitment/persistent-branch-cache

Conversation

@mh0lt
Copy link
Copy Markdown
Contributor

@mh0lt mh0lt commented Mar 17, 2026

Summary

  • Adds a persistent LRU branch cache (65k entries, ~13MB) to HexPatriciaHashed that survives across Process() calls
  • Three-level read path in branchFromCacheOrDB: warmup cache → persistent cache → DB read (populate on miss)
  • Invalidation on all three fold() write paths (delete, propagate, branch) for both deferred and non-deferred modes
  • pbh/pbm (persistent branch hit/miss) counters in commitment metrics and exec3 committed log line
  • Cache propagated to ConcurrentPatriciaHashed subtries via SetBranchCache

Motivation

The warmup cache is ephemeral — created per Process() call and discarded after. After a flush cycle, the first block sees ~62% warmup cache hit rate as it starts cold. The persistent cache retains root-zone branch data across batches, eliminating post-flush cold starts.

Test plan

  • go test ./execution/commitment/... -short passes
  • make lint clean
  • Binary builds and runs on mainnet parallel node
  • pbh/pbm metrics visible in committed log lines
  • CI

🤖 Generated with Claude Code

@awskii
Copy link
Copy Markdown
Member

awskii commented Mar 17, 2026

this cache can be implementation behind PatriciaContext because concurrent trie requires multiple contexts (one per subtrie) and behind this iface it can be seamlessly integrated.

@awskii
Copy link
Copy Markdown
Member

awskii commented Mar 17, 2026

when this cache is invalidated? what if we do unwind? what if it gets incoherent with what we have in state?

@yperbasis yperbasis added this to the 3.5.0 milestone Mar 17, 2026
@mh0lt mh0lt force-pushed the commitment/persistent-branch-cache branch from f86ba07 to f88f80f Compare March 17, 2026 18:23
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues

  1. BranchCache.Get returns a reference, not a copy

Put defensively copies data, but Get returns the cached slice directly. If any consumer writes to the returned []byte, it corrupts the cache entry. The warmup cache's GetBranch may have the same pattern, but
the persistent cache has a much longer lifetime, making this riskier. Consider either documenting the "read-only" contract or returning a copy in Get.

  1. Duplicate accessor on HexPatriciaHashed

func (hph *HexPatriciaHashed) BranchCache() *BranchCache { return hph.branchCache }
func (hph *HexPatriciaHashed) GetBranchCache() *BranchCache { return hph.branchCache }

GetBranchCache() satisfies the Trie interface. BranchCache() is an extra unexported-style accessor doing the same thing. Remove one.

  1. Prefetcher ignores parent context lifecycle

NewBranchPrefetcher creates its context from context.Background(). The ctx passed to StartBranchPrefetcher is used for the trie context factory (DB connections) but not the prefetcher's goroutine lifecycle. If
the parent context is cancelled, the prefetcher keeps running until Stop() is called explicitly. This works because StopBranchPrefetcher() is called in Close() and before ComputeCommitment, but it's fragile —
any new code path that skips Stop() leaks goroutines.

Deriving from the caller's context (context.WithCancel(ctx) instead of context.Background()) would make it fail-safe.

  1. return data, err where err is guaranteed nil

In hex_patricia_hashed.go branchFromCacheOrDB, after the early if err != nil { return nil, err }, the final return data, err always has err == nil. Should be return data, nil for clarity.

  1. PriorityQueue fix is correct but could leak blocked Add() callers

The fix correctly eliminates the close/send race. After Drain() nils q.resultCh, any Add() caller that captured the old channel reference and is blocked on resultCh <- item will remain blocked until ctx.Done()
fires. This is fine if callers always have a bounded context, but worth a comment noting that Add() relies on context cancellation for cleanup after the queue is drained.

  1. No test for BranchPrefetcher

BranchCache has thorough tests. The prefetcher (background workers, Submit/Stop lifecycle, SubmitPlainKey hashing) has none. It's the most concurrency-sensitive new code.

  1. Dead flag ExperimentalConcurrentCommitment

Still referenced in 6 files (backtester.go, flags.go, default_flags.go, statecfg.go, etc.). The PR description says "cleaned up in a follow-up" — fine, but it should actually happen.

@awskii
Copy link
Copy Markdown
Member

awskii commented Mar 27, 2026

better keep cache behind patriciaContext and not bloating commitment

mh0lt pushed a commit that referenced this pull request Mar 29, 2026
Fix all 7 review issues from yperbasis:

1. BranchCache.Get now returns a defensive copy of cached data, preventing
   callers from corrupting long-lived cache entries.

2. Remove duplicate BranchCache() accessor — only GetBranchCache() remains
   (satisfies the Trie interface).

3. BranchPrefetcher derives its goroutine lifecycle from the caller's context
   (context.WithCancel(parentCtx)) instead of context.Background(), making
   it fail-safe against goroutine leaks.

4. branchFromCacheOrDB returns `data, nil` instead of `data, err` where err
   is guaranteed nil after the early return.

5. PriorityQueue.Add() documents that callers blocked on a stale resultCh
   reference after Drain() rely on context cancellation for cleanup.

6. Add BranchPrefetcher lifecycle tests: basic prefetch flow, context
   cancellation, and SubmitPlainKey.

7. Remove dead ExperimentalConcurrentCommitment flag from all 6 files
   (statecfg, flags.go, default_flags.go, integration flags, backtester).

Also: implement GetBranchCache/SetBranchCache on PatriciaContext implementors
(bufferedTrieContext, RecordingContext, MockState, noopPatriciaContext) to
satisfy the extended interface after rebase onto main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mh0lt mh0lt force-pushed the commitment/persistent-branch-cache branch from 50cb728 to 9fab432 Compare March 29, 2026 09:19
@mh0lt mh0lt requested a review from Giulio2002 as a code owner March 29, 2026 09:19
@mh0lt mh0lt force-pushed the commitment/persistent-branch-cache branch from 9fab432 to 8315a9d Compare March 29, 2026 09:27
@mh0lt
Copy link
Copy Markdown
Contributor Author

mh0lt commented Mar 29, 2026

Review feedback addressed

All 7 items from @yperbasis have been fixed in commit bd59f01:

  1. BranchCache.Get returns defensive copy
  2. ✅ Duplicate BranchCache() accessor removed — only GetBranchCache() remains
  3. BranchPrefetcher derives from parent context (context.WithCancel(parentCtx))
  4. return data, nil clarity fix
  5. PriorityQueue.Add cleanup documented
  6. BranchPrefetcher tests added (lifecycle, context cancellation, SubmitPlainKey)
  7. ExperimentalConcurrentCommitment dead flag removed from all 6 files

Rebased onto current main, lint clean, tests pass.

@awskii architectural feedback

The request to move BranchCache behind PatriciaContext is tracked as a follow-up: #20218

The key correctness concern (cache invalidation on Reset/unwind) is already addressed — cache is cleared on Reset() in commit 95c0b82.

🤖 Generated with Claude Code

@mh0lt mh0lt force-pushed the commitment/persistent-branch-cache branch 2 times, most recently from 1579363 to 85be92f Compare March 29, 2026 11:03
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues

  1. TOCTOU race in Submit/SubmitPlainKey vs Stop — send on closed channel

branch_prefetcher.go:
func (p *BranchPrefetcher) Submit(hashedKey []byte) {
if !p.started.Load() || p.closed.Load() { // (A) check
return
}
// ... window ...
select {
case p.work <- key: // (C) panics if channel closed between A and C
default:
}
}
func (p *BranchPrefetcher) Stop() {
if p.closed.Swap(true) { return } // (B) set closed
close(p.work) // closes channel
}

If another goroutine calls Stop() between (A) and (C), p.work is closed and the send panics. The select/default only protects against blocking, not against sending on a closed channel.

Practical risk is low because StopBranchPrefetcher nils out the onHashedKey callback before calling Stop(), and in the current execution model TouchPlainKey and StopBranchPrefetcher don't run concurrently. But
this is fragile — a simple defer recover() or draining through context cancellation (don't close the channel, let workers exit via ctx.Done()) would be more robust.

  1. bufferedTrieContext.PutBranch doesn't copy data/prevData

func (b *bufferedTrieContext) PutBranch(prefix, data, prevData []byte) error {
key := string(prefix)
b.writes[key] = data // stores reference, no copy
...
b.prev[key] = prevData // stores reference, no copy
return nil
}

If BranchEncoder.CollectUpdate passes a slice backed by a reused encoder buffer, subsequent encodes during the same goroutine could corrupt previously buffered entries. The non-buffered TrieContext.PutBranch
writes through to the MemBatch (which copies), so this wasn't an issue before.

Verify that EncodeBranch always returns a freshly allocated slice. If it does, this is fine. If there's any buffer reuse, add slices.Clone(data) / slices.Clone(prevData).

  1. onHashedKey read/write is not synchronized

// Writer (any goroutine):
func (t *Updates) SetOnHashedKey(fn func([]byte)) { t.onHashedKey = fn }

// Reader (execution goroutine):
if t.onHashedKey != nil { t.onHashedKey(hashedKey) }

This is a data race under the Go race detector if SetOnHashedKey(nil) and TouchPlainKey ever execute on different goroutines. Currently they don't overlap in practice, but an atomic.Pointer or a mutex would
make it race-free.

  1. EnableWarmupCache(!isApplyingBlocks) — undocumented behavioral change

exec3.go changes from always-on warmup cache to disabled during initial sync:
// Before:
doms.EnableWarmupCache(true)
// After:
doms.EnableWarmupCache(!isApplyingBlocks)

This is a separate optimization from the persistent cache addition. During isApplyingBlocks (initial sync), the ephemeral warmup cache is now disabled, relying entirely on the persistent cache + prefetcher.
This should be called out explicitly in the PR description since it changes sync-time behavior independent of the cache feature.

  1. bufferedTrieContext.flush iterates map (non-deterministic order)

func (b *bufferedTrieContext) flush(target PatriciaContext) error {
for key, data := range b.writes {
target.PutBranch([]byte(key), data, b.prev[key])
}
...
}

If PutBranch ordering matters for state history (e.g., parent branches before children for correct prevData chain), non-deterministic map iteration could produce inconsistent results across runs. Within a
single subtrie's writes this is likely fine since each key is independent, but worth a comment explaining why ordering doesn't matter here.

Minor

  • Lost observability: missAccount, missStorage, missBranch warmup cache metrics are removed and replaced only with pCacheBranchHit/pCacheBranchMiss. Warmup cache effectiveness for accounts/storage is no longer
    tracked.
  • Mock Variant() in test: mockPatriciaContext implements Variant() but PatriciaContext doesn't require it — dead code in the test.
  • BranchCache struct layout: The mu sync.RWMutex sits between the pinned arrays and branches *maphash.LRU. On hot paths, mu and branches could share a cache line with t4, causing false sharing between
    pinned-tier and LRU-tier operations. Consider placing mu and branches before the arrays or adding padding. Likely negligible in practice given the RWMutex acquisition cost dominates.

Verdict

The core design — three-level read path, invalidation on all fold write paths, cache cleared on Reset — is correct. The bufferedTrieContext properly isolates per-goroutine writes during ParallelHashSort. The
issues above are hardening/robustness concerns (closed-channel panic, missing copies, data race under detector) rather than fundamental correctness bugs in the current execution model. Items 1–3 should be
addressed before merge; items 4–5 are judgment calls.

mh0lt added a commit that referenced this pull request Mar 30, 2026
Fixes for yperbasis review on #19954:

- branch_prefetcher: fix TOCTOU race in Submit vs Stop by using
  defer recover() and cancelling context before closing channel
- hex_concurrent_patricia_hashed: copy data/prevData in
  bufferedTrieContext.PutBranch with slices.Clone; add comment
  explaining why flush() map iteration order doesn't matter
- commitment: use atomic.Pointer for onHashedKey callback to
  eliminate data race under Go race detector
- exec3: add comment explaining EnableWarmupCache(!isApplyingBlocks)
  behavioral change
- branch_prefetcher_test: remove dead Variant() method from mock
- branch_cache: add comment about struct layout / false sharing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mh0lt added a commit that referenced this pull request Mar 30, 2026
Fixes for yperbasis review on #19954:

- branch_prefetcher: fix TOCTOU race in Submit vs Stop by using
  defer recover() and cancelling context before closing channel
- hex_concurrent_patricia_hashed: copy data/prevData in
  bufferedTrieContext.PutBranch with slices.Clone; add comment
  explaining why flush() map iteration order doesn't matter
- commitment: use atomic.Pointer for onHashedKey callback to
  eliminate data race under Go race detector
- exec3: add comment explaining EnableWarmupCache(!isApplyingBlocks)
  behavioral change
- branch_prefetcher_test: remove dead Variant() method from mock
- branch_cache: add comment about struct layout / false sharing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mh0lt mh0lt force-pushed the commitment/persistent-branch-cache branch from a3e9bee to 589816e Compare March 30, 2026 20:52
@mh0lt
Copy link
Copy Markdown
Contributor Author

mh0lt commented Mar 31, 2026

CI Status After Rebase (2026-03-30)

Rebased onto latest main (ef63408ad1), 11 commits, no conflicts. CI still failing — these are real code bugs, not stale-base issues (main is fully green).

Nil Pointer Dereferences in Tests

TestCheckStateVerify — db/integrity
TestAggregator_SqueezeCommitment — db/state (+ merge index out of range [5] with length 5)
TestSharedDomain_Unwind — db/state/execctx
TestAggregatorV3_BuildFiles_WithReorgDepth — db/state

The BranchCache changes likely introduce fields that aren't initialised in test setup paths. The merge index out of range [5] with length 5 suggests the cache interferes with merge/squeeze file operations.

Also Failing

  • All hive engine tests (api, cancun, exchange-capabilities, withdrawals) + rpc/compat
  • Race tests: core-rpc, execution-other, execution-eest-blockchain/devnet, execution-tests
  • Unit tests: ubuntu, macos, windows
  • Kurtosis pectra, benchmarks

Context

We need this PR for the parallel executor — commitment is currently ~50% of batch wall-clock time during initial sync (pprof shows 39% CPU in ComputeCommitmentfollowAndUpdateunfold → domain reads). Execution alone runs at 467-499 mgas/s but overall throughput is ~210 mgas/s due to commitment I/O.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is red

The warmup cache is ephemeral (created per Process() call), so after a
flush cycle the first block sees ~62% cache hit rate as the cache starts
cold. This adds a persistent LRU branch cache (65k entries, ~13MB) that
survives across Process() calls:

- Level 1: ephemeral warmup cache (populated by warmup workers)
- Level 2: persistent branch cache (populated on DB miss, invalidated
  on branch writes in fold())
- Level 3: DB read

The persistent cache eliminates post-flush cold starts and improves
initial sync by retaining root-zone branch data across batches.

Invalidation happens in all three fold() write paths (delete, propagate,
branch) for both deferred and non-deferred update modes.

Metrics: pbh/pbm (persistent branch hit/miss) counters added to both
the commitment metrics and the exec3 committed log line.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mark Holt and others added 19 commits April 4, 2026 17:02
…a to BranchCache

Background warmup (Warmuper) and prefetch (BranchPrefetcher) goroutines
could race with the main trie's fold operations on the shared BranchCache:

1. Warmup reads branch prefix X from sd.mem, gets value V
2. Main trie fold updates prefix X to V', invalidates BranchCache[X]
3. Warmup writes V (now stale) to BranchCache[X]
4. Main trie reads stale V from BranchCache → wrong trie root

Fix: warmup/prefetch goroutines no longer write to the persistent
BranchCache. They still read from it (to skip redundant DB reads) and
still populate the ephemeral warmup cache. Only the main trie's
single-goroutine path populates the persistent BranchCache, which
eliminates the TOCTOU race entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert the previous fix that removed BranchCache.Put() from warmup/prefetch
goroutines. That fix defeated the purpose of warmup entirely — the whole point
is to populate the persistent cache so the main trie finds data already cached.

The TOCTOU race described in the reverted commit does not actually exist because
the trieContextFactory already creates TrieContexts that read through SharedDomains
(via sd.AsGetter(roTx) → sd.GetLatest → sd.mem.GetLatest first, roTx fallback).
Since sd.mem always has the latest committed state, the data warmup goroutines
read is already fresh — writing it to BranchCache is safe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a dirty set to BranchCache that tracks keys invalidated by the main
trie during fold(). Background warmup and prefetch goroutines use the new
PutIfClean method which skips writes for dirty keys, preventing them from
overwriting authoritative trie data with stale reads.

The race window: warmup reads branch X from sd.mem (value V), main trie
fold updates X to V' and calls Invalidate(X), warmup writes stale V to
cache. PutIfClean checks the dirty flag and skips the write.

Put (used by main trie and on-demand DB reads) always writes and clears
the dirty flag. Clear resets all dirty flags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When restoring the patricia trie state (e.g. after unwind/reorg via
SeekCommitment), the persistent BranchCache must be cleared. The
cached branch nodes come from the old fork and are stale — reusing
them produces wrong trie roots that differ by only a few bytes,
causing "Wrong trie root" errors in hive reorg tests.

Root cause: restorePatriciaState calls SetState (which restores the
internal trie node state) but did not clear the BranchCache. The
justRestored flag then prevented the subsequent Reset() from clearing
it either. After the fix, the cache is explicitly cleared right after
SetState, before justRestored is set.

Fixes hive engine/api "Invalid Missing Ancestor Syncing ReOrg" tests
(33 failures) and engine/withdrawals sync/reorg tests (8 failures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous fix cleared the BranchCache in restorePatriciaState
(called by SeekCommitment). However, the wrong trie root error
occurs DURING unwindExec3 — before SeekCommitment is called.

The unwind applies changesets backwards to reconstruct the state at
the unwind point. If the BranchCache contains branch nodes from
blocks above the unwind point, the trie reads stale cached data
instead of the DB values that the changesets reference, producing
a root that differs by a few bytes from the expected value.

Fix: clear the BranchCache at the start of UnwindExecutionStage,
before unwindExec3 runs. Also add a BranchCache() accessor to
SharedDomainsCommitmentContext.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause of persistent hive reorg failures: restorePatriciaState
calls SetState only on the root HexPatriciaHashed, but the 16
mounted subtries in ConcurrentPatriciaHashed retain stale internal
state (cells, depths) from the pre-unwind execution. If
CanDoConcurrentNext() had set the concurrent flag before the unwind,
the first post-restore Process() call would use ParallelHashSort
with these stale mounts, producing a wrong trie root.

Fix:
- Add ResetMounts() to ConcurrentPatriciaHashed
- Call ResetMounts() in restorePatriciaState after SetState
- Reset concurrent commitment flag to false, forcing the first
  post-restore run through the serial path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ParallelHashSort needs per-goroutine DB contexts (CtxFactory) to
work correctly. Without it (e.g. during fork validation where
paraTrieDB is nil), fall back to serial processing through the
root trie.

Previously, the concurrent flag could be set from a prior Process()
call and persist across unwind/restore, causing ParallelHashSort to
run without the required per-goroutine contexts. Also gate
CanDoConcurrentNext() on CtxFactory availability to prevent setting
the flag when it can't be honoured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Disable the persistent BranchCache to determine whether the wrong
trie root in hive reorg tests is caused by stale cache data or by
a fundamental issue in ConcurrentPatriciaHashed.

If hive passes without the cache, the bug is cache-related.
If it still fails, the bug is in the concurrent trie itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change branchFromCacheOrDB to use PutIfClean instead of Put when
populating the persistent BranchCache after a DB miss. Put clears
the dirty flag unconditionally, which can cause stale data to be
served from the cache if:

1. fold() Invalidates a key (marks dirty, removes from cache)
2. A subsequent branchFromCacheOrDB reads from DB (gets old data
   because PutBranch wrote to mem batch but the key's DB value
   hasn't been updated in the underlying snapshot)
3. Put stores old data AND clears the dirty flag
4. Next read finds the stale data in cache with dirty=false

PutIfClean respects the dirty flag — if fold() has Invalidated the
key, the DB-read data won't be cached, and the next read will go
through the DB/mem batch path to get the correct value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The BranchCache was populated on every DB miss in branchFromCacheOrDB
during trie Process/fold. This caused stale branch data to persist
in the cache across unfold→fold transitions within a single Process
call, producing wrong trie roots after unwind/reorg.

Fix: make the cache read-only during inline trie processing. Cache
hits still return data (populated by warmup workers/prefetchers via
PutIfClean), but DB-read misses no longer store into the cache. This
ensures every inline read during fold sees the authoritative data
from the DB/mem batch.

The cache is still populated by:
- Warmup workers (PutIfClean, respects dirty flags)
- Branch prefetchers (PutIfClean, respects dirty flags)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment out ALL BranchCache reads, writes, and Invalidate calls in
HexPatriciaHashed. The cache object is still created and set (non-nil)
but never accessed during Process/fold.

If this passes: the bug is in BranchCache Get/Put/Invalidate interaction.
If this fails: having a non-nil branchCache field changes HPH behavior
through some other mechanism (e.g. Reset() calling Clear()).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The persistent BranchCache caused wrong trie roots after unwind/reorg
in hive tests. Root cause: the inline branchFromCacheOrDB in
HexPatriciaHashed populated the cache on DB reads (Put) and
invalidated on branch writes (Invalidate). This created a window
where stale data was served from the cache during the fold/unfold
cycle within a single Process call.

Fix: the inline trie Process path no longer reads from or writes to
the persistent BranchCache. It goes directly from the ephemeral
warmup cache (level 1) to DB reads (level 3). The persistent cache
is still used by:
- Warmup workers (warmuper.go) via their own branchFromCacheOrDB
- Branch prefetchers via PutIfClean
- Invalidate calls from fold() still mark keys dirty to prevent
  warmup workers from caching stale data

This preserves the cache's value for warmup pre-loading while
eliminating the stale-data bug during trie computation.

Verified locally: hive engine/api suite passes 129/129 (0 failures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: mounted subtries in ParallelHashSort run with per-goroutine
roTx DB contexts that cannot see the root trie's uncommitted rwTx
writes. When mounts read from the shared BranchCache (populated by
the root from its rwTx), they get data inconsistent with their own
DB view, producing wrong trie roots after unwind/reorg.

Fix: gate both cache reads (Get) and writes (Put) on !hph.mounted.
The persistent BranchCache is now exclusively used by the root trie.
Mounted subtries always read from their own DB context, which is
always authoritative for their goroutine.

The warmup workers and branch prefetchers (warmuper.go) have their
own branchFromCacheOrDB that reads through SharedDomains and uses
PutIfClean — those paths are unaffected.

Verified locally: hive engine/api passes 129/129 (0 failures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: flushPendingUpdate writes deferred branch data to sd.mem
but did not update the persistent BranchCache. When the next
Process() call ran, branchFromCacheOrDB returned stale cached values
from the previous block instead of the just-flushed data in sd.mem.
This produced wrong trie roots during fork validation (which uses
deferred commitment updates).

Fix: pass the BranchCache to flushPendingUpdate and call Put() for
each branch written. This keeps the cache in sync with sd.mem so
subsequent reads see fresh data. No cache Clear needed — the cache
is updated entry-by-entry.

Also reverts the !mounted guard on cache reads/writes — the cache
is now safe for all tries since the underlying data inconsistency
is fixed at the source.

Verified locally: hive engine/api passes 129/129 (0 failures) with
full cache reads + writes enabled for root and mounted subtries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SetConcurrentCommitment(false) calls initCollector() which
reinitializes ETL collectors, potentially disrupting callers that
create SharedDomains for RPC handlers (eth_simulateV1, receipts
generator, etc.). These callers call SeekCommitment repeatedly
between computations.

The concurrent flag is already forced to false after Process() when
paraTrieDB is nil (line 482). Resetting it in restorePatriciaState
is redundant and causes side effects via initCollector().

Fixes 18 QA RPC integration test failures (eth_simulateV1,
eth_getBlockReceipts, erigon_getLogsByHash, ots_getBlockDetails)
that started failing after commit 6bf1202.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SetConcurrentCommitment calls initCollector() which reinitializes
ETL collectors. When called without a CtxFactory (i.e. paraTrieDB
is nil), the flag flip disrupts RPC handlers that create their own
SharedDomains without parallel trie support.

Gate the CanDoConcurrentNext → SetConcurrentCommitment call on
warmup.CtxFactory being non-nil, matching the existing guard in
commitmentdb that overrides back to serial after Process when
paraTrieDB is nil.

Engine api: 129/129 pass (verified locally).
RPC-compat: pre-existing failure at block 5 — separate investigation
needed for the ConcurrentPatriciaHashed serial import path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ParallelHashSort produces wrong trie roots for certain chain patterns
(confirmed with hive rpc-compat: block 5 gives 11b616... instead of
4e62f7...). The issue is in the parallel fold algorithm itself — it
persists with and without bufferedTrieContext, with and without
BranchCache. Forced serial mode through p.root.Process() produces
correct roots.

Fix: CanDoConcurrentNext now always returns false. This keeps the
ConcurrentPatriciaHashed wrapper (with BranchCache, mount infra,
etc.) but routes all Process calls through the serial HPH path.

The bufferedTrieContext is retained because the history domain
writers (putHistory → domainWriters) are NOT thread-safe — removing
the buffering caused 14 engine API failures from concurrent write
corruption. The buffering will be needed when ParallelHashSort is
re-enabled.

Verified locally:
- hive rpc-compat: 208 tests, 4 failures (matches main)
- hive engine/api: 129/129 pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Test_ParallelHashSort_MultiBlock that verifies serial vs parallel
root hash equivalence over 5 consecutive blocks with 150 accounts.
Test PASSES with mock state, confirming the ParallelHashSort algorithm
itself is correct.

The hive rpc-compat failure occurs only with the real SharedDomains
+ MemBatch path, not with mock state. The bug is in the interaction
between bufferedTrieContext flush and DomainPut/GetLatest — possibly
the skip-if-equal optimization in DomainPut(CommitmentDomain) or a
visibility issue in the history domain writers.

CanDoConcurrentNext retains the topology check code (for reference)
but always returns false until the DB interaction is fixed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Giulio2002
Copy link
Copy Markdown
Collaborator

Giulio2002 commented Apr 6, 2026

I dont believe this gives you significant improvement as I have already tested this in the past. also please, if you want to do this, use the stateCache commitment cache. it is much cleaner and it is only a 4 lines diff which is currently disabled. the only reason why stateCache is enabled, is not for performance, but for whenever we merge files (many page faults).

hit/misses is the wrong metric. look at performance gains. you can test this using Milen's tool

mh0lt and others added 3 commits April 7, 2026 06:29
TestParallelHashSort_RealDB creates two real SharedDomains (serial
and parallel) with 500 random accounts + storage updates over 10
blocks, verifying that serial and parallel Process produce identical
root hashes. Currently PASSES — the bug is specific to the hive
exec3 pipeline path, not reproducible with direct DomainPut +
ComputeCommitment.

Also adds SetConcurrentCommitment accessor to
SharedDomainsCommitmentContext for test control.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestConcurrentPatriciaHashed_ManyBlocks runs 20 blocks with 20
accounts and 3-5 txns/block through the full exec3 pipeline with
ConcurrentPatriciaHashed + ParallelHashSort. PASSES with both
serial and parallel modes, with and without race detector.

Re-enable CanDoConcurrentNext topology check — ParallelHashSort
produces correct roots through the real staged sync pipeline.
The hive rpc-compat failure is specific to the rpc-compat chain's
genesis/chain configuration and needs targeted reproduction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntIfPossible

enableConcurrentCommitmentIfPossible called SetConcurrentCommitment
unconditionally, even when the result was false (no mode change).
SetConcurrentCommitment calls initCollector() which closes and
recreates ETL collectors. When called from SeekCommitment during
every ExecV3 block execution, this unnecessary reinitialization
could disrupt the collector state.

Fix: only call SetConcurrentCommitment(true) when actually enabling
concurrent mode. When CanDoConcurrentNext returns false, skip the
call entirely — the mode is already serial by default.

Also clean up debug logging and counter from previous investigation.
CanDoConcurrentNext remains disabled until ParallelHashSort's
interaction with real DomainPut/GetLatest is understood.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mh0lt
Copy link
Copy Markdown
Contributor Author

mh0lt commented Apr 7, 2026

Closed as likely this redundant work now.

@mh0lt mh0lt closed this Apr 7, 2026
mh0lt pushed a commit that referenced this pull request May 5, 2026
Adds the carry-as-is correctness invariants from the PR #19954
investigation as scaffolding on the existing WarmupCache:

  - branchEntry.dirty atomic.Bool — signals "stale until cleared"
  - PutBranchIfClean(prefix, data) bool — skips write if entry dirty
  - MarkBranchDirty(prefix) — mark for later refusal of stale puts

These are scaffolding additions; no callsites use them yet. Existing
PutBranch unconditionally overwrites and clears any prior dirty flag
(creates a fresh entry), preserving today's semantic exactly for
existing callers.

Why now: the prototype investigation (see
agentspecs/commitment-cache-prototype-dev-context.md) found that
inline-invalidate-on-write is incompatible with deferred encoding —
update-in-place breaks correctness because there's a window between
fold (computes hash, holds new state) and encoder (writes encoded
bytes) where readers see stale cached bytes. The reth-research
(agentspecs/reth-1ggas-research.md §4) calls the dirty-flag pattern
out as the design that resolves this without forcing synchronous
encoding: the encoder marks the entry dirty BEFORE its own write
completes, so any racing read knows to bypass the cache for that key.

Today's WarmupCache lifecycle (per-Process, warmup completes before
fold begins) does NOT exhibit this race — these invariants are
infrastructure for the future cross-block persistence work where
warmup-style writers can outlive their parent Process.

Tests:
- TestWarmupCache_DirtyFlag: PutBranchIfClean refuses dirty entry,
  unconditional PutBranch clears dirty.
- TestWarmupCache_DirtyFlag_MarkAbsentKey: marking absent key is
  no-op (no panic, no entry created).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mh0lt pushed a commit that referenced this pull request May 5, 2026
Step 7a of the representation-reduction sequence: per-Process
integration of the BranchCache type added in the previous commit.

Plumbing:

  - Trie interface gains SetBranchCache(*BranchCache).
    HexPatriciaHashed implementation propagates to its branchEncoder.
    ConcurrentPatriciaHashed implementation propagates the SAME
    instance to root + all 16 mounts (sharing one cache is correct
    under the concurrency contract — mounts partition prefix space
    by first nibble, so cross-mount writes target distinct keys).

  - InitializeTrieAndUpdates constructs a new BranchCache(default)
    per trie instance and attaches it. Lifetime today = trie
    lifetime = per-Process. Future cross-block persistence work
    (step 7b) lifts this to aggTx scope by constructing the cache
    one layer up and passing it in.

Read path (HPH.branchFromCacheOrDB):
  L1 WarmupCache (existing) → L2 BranchCache (new) → L3 ctx.Branch.
  L3 hits with non-empty result populate L2 so subsequent reads hit
  L2 within the cache's lifetime. L1 stays first because warmup
  workers may have pre-fetched with prefix-walk-derived freshness.

Write path (BranchEncoder.CollectUpdate):
  - MarkDirty(prefix) BEFORE encode work — protects against
    concurrent warmup-style writers racing into PutIfClean during
    the encode (race documented in the cache's Concurrency Contract).
  - Put(prefixCopy, updateCopy) AFTER ctx.PutBranch succeeds —
    replaces the dirty entry with fresh canonical bytes. Single
    writer per prefix per fold step (current sequential fold +
    first-nibble mount partitioning) means no race on this Put.

Lifecycle:
  HPH.Reset clears the BranchCache when called from the root trie
  (gated by !hph.mounted). Mounted subtries share the root's cache,
  so a mount calling Clear would dump entries the root still wants.
  Carries the invariant from PR #19954 commit 1612d56.

Today's expected performance impact: minimal. Per-Process lifetime
means cache is empty at Process start, so first reads always miss.
The cache helps only branches that are read multiple times within
ONE Process — uncommon in current code paths. Step 7b is where
the real perf swing comes from (cross-block persistence so block
N reads hit branches written by block N-1).

This commit is the safe stepping stone: it validates the wire-up
end-to-end (read path + write path + concurrency contract +
lifecycle) without changing perf characteristics. Bench should
match Run I baseline (7.16 mgas/s on canonical SSTORE-bloat block).

All commitment tests pass, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mh0lt pushed a commit that referenced this pull request May 5, 2026
Adds the carry-as-is correctness invariants from the PR #19954
investigation as scaffolding on the existing WarmupCache:

  - branchEntry.dirty atomic.Bool — signals "stale until cleared"
  - PutBranchIfClean(prefix, data) bool — skips write if entry dirty
  - MarkBranchDirty(prefix) — mark for later refusal of stale puts

These are scaffolding additions; no callsites use them yet. Existing
PutBranch unconditionally overwrites and clears any prior dirty flag
(creates a fresh entry), preserving today's semantic exactly for
existing callers.

Why now: the prototype investigation (see
agentspecs/commitment-cache-prototype-dev-context.md) found that
inline-invalidate-on-write is incompatible with deferred encoding —
update-in-place breaks correctness because there's a window between
fold (computes hash, holds new state) and encoder (writes encoded
bytes) where readers see stale cached bytes. The reth-research
(agentspecs/reth-1ggas-research.md §4) calls the dirty-flag pattern
out as the design that resolves this without forcing synchronous
encoding: the encoder marks the entry dirty BEFORE its own write
completes, so any racing read knows to bypass the cache for that key.

Today's WarmupCache lifecycle (per-Process, warmup completes before
fold begins) does NOT exhibit this race — these invariants are
infrastructure for the future cross-block persistence work where
warmup-style writers can outlive their parent Process.

Tests:
- TestWarmupCache_DirtyFlag: PutBranchIfClean refuses dirty entry,
  unconditional PutBranch clears dirty.
- TestWarmupCache_DirtyFlag_MarkAbsentKey: marking absent key is
  no-op (no panic, no entry created).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mh0lt pushed a commit that referenced this pull request May 5, 2026
Step 7a of the representation-reduction sequence: per-Process
integration of the BranchCache type added in the previous commit.

Plumbing:

  - Trie interface gains SetBranchCache(*BranchCache).
    HexPatriciaHashed implementation propagates to its branchEncoder.
    ConcurrentPatriciaHashed implementation propagates the SAME
    instance to root + all 16 mounts (sharing one cache is correct
    under the concurrency contract — mounts partition prefix space
    by first nibble, so cross-mount writes target distinct keys).

  - InitializeTrieAndUpdates constructs a new BranchCache(default)
    per trie instance and attaches it. Lifetime today = trie
    lifetime = per-Process. Future cross-block persistence work
    (step 7b) lifts this to aggTx scope by constructing the cache
    one layer up and passing it in.

Read path (HPH.branchFromCacheOrDB):
  L1 WarmupCache (existing) → L2 BranchCache (new) → L3 ctx.Branch.
  L3 hits with non-empty result populate L2 so subsequent reads hit
  L2 within the cache's lifetime. L1 stays first because warmup
  workers may have pre-fetched with prefix-walk-derived freshness.

Write path (BranchEncoder.CollectUpdate):
  - MarkDirty(prefix) BEFORE encode work — protects against
    concurrent warmup-style writers racing into PutIfClean during
    the encode (race documented in the cache's Concurrency Contract).
  - Put(prefixCopy, updateCopy) AFTER ctx.PutBranch succeeds —
    replaces the dirty entry with fresh canonical bytes. Single
    writer per prefix per fold step (current sequential fold +
    first-nibble mount partitioning) means no race on this Put.

Lifecycle:
  HPH.Reset clears the BranchCache when called from the root trie
  (gated by !hph.mounted). Mounted subtries share the root's cache,
  so a mount calling Clear would dump entries the root still wants.
  Carries the invariant from PR #19954 commit 1612d56.

Today's expected performance impact: minimal. Per-Process lifetime
means cache is empty at Process start, so first reads always miss.
The cache helps only branches that are read multiple times within
ONE Process — uncommon in current code paths. Step 7b is where
the real perf swing comes from (cross-block persistence so block
N reads hit branches written by block N-1).

This commit is the safe stepping stone: it validates the wire-up
end-to-end (read path + write path + concurrency contract +
lifecycle) without changing perf characteristics. Bench should
match Run I baseline (7.16 mgas/s on canonical SSTORE-bloat block).

All commitment tests pass, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants