Skip to content

fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220

Open
diega wants to merge 3 commits intobesu-eth:mainfrom
diega:frontier-root-hash-tracker
Open

fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220
diega wants to merge 3 commits intobesu-eth:mainfrom
diega:frontier-root-hash-tracker

Conversation

@diega
Copy link
Copy Markdown
Contributor

@diega diega commented Apr 10, 2026

PR description

Bonsai full sync hangs indefinitely at the 2016 DoS attack blocks (~2.3M) on mainnet because frontierRootHash() copies the entire accumulator and rebuilds the Merkle trie from scratch on every per-transaction call, giving O(N²) total cost per block.

Commit 1: copy constructor refactor

Replaces the two-phase cloneFromUpdater copy pattern in PathBasedWorldStateUpdateAccumulator with a proper copy constructor chain. Each level copies its own fields with compile-time type safety. This is a prerequisite for commit 2, which adds a subclass-specific field (frontierDirtyAddresses) that needs to be copied correctly.

Commit 2: incremental frontierRootHash

Introduces FrontierRootHashTracker, which makes frontierRootHash() incremental instead of rebuilding from scratch on every call.

How it works: The accumulator's commit() override captures the addresses from getUpdatedAccounts() and getDeletedAccountAddresses() into a frontierDirtyAddresses set. This happens after super.commit() populates accountsToUpdate, so every dirty address is guaranteed to have a corresponding entry. The dirty set accumulates across transactions within a block.

When frontierRootHash() is called, the tracker:

  1. Creates the account trie lazily on the first call (from the persisted block root), and caches it across subsequent calls within the same block.
  2. Processes only the addresses dirtied since the last call: updating storage roots, then putting/removing account nodes in the cached trie.
  3. Clears the processed addresses only after successful computation.

At the block boundary, persist() calls tracker.reset() to discard the cached trie before the next block.

Why it's safe to operate on the live accumulator instead of copying it: the previous accumulator.copy() was introduced to prevent setStorageRoot() mutations from corrupting the live state. But setStorageRoot() sets deterministic values computed from the trie: the same values that persist() will compute at end-of-block. The no-op storage updater used for frontier ensures no trie nodes are written to storage. The accumulator is only read and mutated in the same way that persist() would later.

Commit 3: BonsaiTrieFactory

Commit 2 makes the frontier account trie sequential via createFrontierTrie() (which always returns StoredMerklePatriciaTrie). But the frontier storage tries for each dirty account still go through the normal createTrie() path, which returns ParallelStoredMerklePatriciaTrie when parallel computation is enabled. This adds ForkJoinPool scheduling overhead on every per-transaction call without benefit: the frontier path is inherently sequential because each receipt depends on the prior transaction's state root.

This commit introduces BonsaiTrieFactory with a TrieMode enum (ALWAYS_SEQUENTIAL vs PARALLELIZE_ALLOWED) that centralizes the trie implementation decision. The frontier path passes ALWAYS_SEQUENTIAL for both account and storage tries, eliminating the parallel overhead. The normal persist() / block computation path continues to use PARALLELIZE_ALLOWED.

Post-Byzantium receipts use a status code instead of a state root, so the frontier path is only active for pre-Byzantium blocks. Neither commit changes behavior for post-Byzantium blocks or the normal block computation path.

Verifying the performance improvement locally

The O(N²) scaling can be reproduced by measuring individual frontierRootHash() call times across a simulated block (500 accounts × 20 storage slots, no persist() between calls):

// Setup: create 500 accounts each with 20 storage slots, commit, persist
// ...
for (int i = 0; i < 500; i++) {
    WorldUpdater updater = worldState.updater();
    updater.getAccount(addresses[i]).setBalance(Wei.of(1000 + i));
    updater.commit();
    updater.markTransactionBoundary();

    long start = System.nanoTime();
    worldState.frontierRootHash();
    long elapsed = System.nanoTime() - start;
    // Before fix: elapsed grows linearly with i (O(N²) total)
    // After fix: elapsed stays constant
}

Before this PR: the last call takes 8–11x longer than the first.
After this PR: the last call is the same speed or faster than the first.

The parallel trie overhead can be verified by comparing frontierRootHash() wall time with parallelStateRootComputationEnabled=true vs false using RocksDB-backed storage. Before this PR: ~1.9x overhead. After: ~1.0x.

Fixed Issue(s)

fixes #10155

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests
  • hive tests: Engine or other RPCs modified?

diega added 3 commits April 9, 2026 12:08
…or hierarchy

Replace the two-phase copy pattern (construct empty + cloneFromUpdater)
with a proper copy constructor chain. Each level copies its own fields
with compile-time type safety, eliminating the need for a separate
public cloneFromUpdater method that could be called without copying
subclass-specific state.

Signed-off-by: Diego López León <dieguitoll@gmail.com>
On pre-Byzantium blocks, FrontierTransactionReceiptFactory calls
frontierRootHash() for every transaction to include intermediate state
roots in receipts. The previous implementation copied the entire
accumulator and rebuilt the Merkle trie from scratch on each call,
giving O(N) cost per call and O(N²) total for a block with N
transactions. On the 2016 DoS attack blocks (~2.3M on mainnet), this
causes full sync to stall indefinitely.

The fix introduces FrontierRootHashTracker, which caches the account
trie between calls and tracks dirty addresses per transaction via a
single Set<Address> in the accumulator. Each frontierRootHash() call
now only processes accounts changed by the latest transaction, reducing
per-call cost from O(N) to O(k) where k is the number of accounts
touched by that transaction.

Safety:
- Dirty addresses are cleared only after successful computation.
- The cache is reset on persist() at block boundaries.
- Uses StoredMerklePatriciaTrie (not ParallelStoredMerklePatriciaTrie)
  to avoid ForkJoinPool overhead on per-transaction calls.
- Only affects pre-Byzantium code path; post-Byzantium is unchanged.

Fixes besu-eth#10155

Signed-off-by: Diego López León <dieguitoll@gmail.com>
…head in frontier path

Adds a Usage-based trie construction policy that selects the appropriate
MerkleTrie implementation based on execution context:

- BLOCK_COMPUTATION: throughput-oriented, respects parallelStateRootComputationEnabled
- FRONTIER_INCREMENTAL: latency-sensitive, always uses StoredMerklePatriciaTrie

This eliminates the ParallelStoredMerklePatriciaTrie ForkJoinPool overhead
from per-transaction frontierRootHash() calls. The factory replaces the
inline createTrie() logic and is used by both the normal block computation
path (unchanged behavior) and the incremental frontier path via
FrontierRootHashTracker (now guaranteed sequential for both account and
storage tries).

Signed-off-by: Diego López León <dieguitoll@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

1 participant