feat: Add parallel state root computation support for Bonsai trie by matkt · Pull Request #9576 · besu-eth/besu

matkt · 2025-12-16T10:04:24Z

PR description

Description

This PR introduces parallel processing capabilities for Merkle Patricia Trie operations during state root computation in Bonsai storage format, significantly improving block validation performance.

Changes

Core Implementation

ParallelStoredMerklePatriciaTrie: New parallel implementation of StoredMerklePatriciaTrie
- Batches pending updates (puts/removes) before processing
- Recursively processes branch children in parallel using ForkJoinPool
- Handles branch, extension, leaf, and null node scenarios

Configuration

WorldStateConfig: Added isParallelStateRootComputationEnabled flag (default: true)
PathBasedExtraStorageConfiguration: Added parallel state root computation configuration
CLI Option: --bonsai-parallel-state-root-computation-enabled to enable/disable feature

Key Features

Parallel Branch Processing: When a branch node has multiple children with updates, they are processed concurrently
Extension Node Expansion: Extensions are temporarily expanded into branches when beneficial for parallel processing
Leaf/Null Node Handling: Intelligent expansion into branch structures when multiple diverging updates exist
Smart Partitioning: Updates are grouped by size - large groups processed in parallel, small groups sequentially
Commit Cache: Thread-safe caching of node updates during parallel processing

Backward Compatibility

Feature is opt-in via configuration flag (though enabled by default)
Falls back to sequential StoredMerklePatriciaTrie when disabled
No breaking changes to existing APIs
Fully compatible with existing Bonsai storage format

Configuration Examples

# Enable (default)
besu --data-storage-format=BONSAI --bonsai-parallel-state-root-computation-enabled=true

# Disable for comparison/debugging
besu --data-storage-format=BONSAI --bonsai-parallel-state-root-computation-enabled=false

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

ahamlat · 2026-01-16T13:07:28Z

+      this.root = loadNode(root);
+
+      // Convert pending updates to UpdateEntry objects with nibble paths
+      final List<UpdateEntry<V>> entries =


I suggest to use simple for loops to avoid steams overhead on memory allocations and latency.

Check this branch that has the change that replaces streams with simple for loops.

ahamlat · 2026-01-16T14:08:13Z

+      for (final Map.Entry<Byte, List<UpdateEntry<V>>> entry : largeGroups.entrySet()) {
+        final byte nibble = entry.getKey();
+        final List<UpdateEntry<V>> childUpdates = entry.getValue();
+        final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));


Suggested change

final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));

final byte[] out = new byte[pathDepth + 1];

final byte[] in = location.toArrayUnsafe();

System.arraycopy(in, 0, out, 0, pathDepth);

out[len] = nibble;

final Bytes childLocation = Bytes.wrap(out);

This the JMH benchmark that shows the different in performance. arraycopy_newArray_wrap is the new suggested implementation

Benchmark (locationSize) Mode Cnt Score Error Units BytesConcatenateBenchmark.arraycopy_newArray_wrap 8 avgt 16 6.313 ± 0.321 ns/op BytesConcatenateBenchmark.arraycopy_newArray_wrap 16 avgt 16 6.150 ± 0.140 ns/op BytesConcatenateBenchmark.arraycopy_newArray_wrap 32 avgt 16 6.383 ± 0.291 ns/op BytesConcatenateBenchmark.concat_bytesOf 8 avgt 16 37.773 ± 6.029 ns/op BytesConcatenateBenchmark.concat_bytesOf 16 avgt 16 35.169 ± 2.331 ns/op BytesConcatenateBenchmark.concat_bytesOf 32 avgt 16 36.688 ± 1.568 ns/op

Check this branch with a clean implementation

ahamlat · 2026-01-16T14:10:36Z

+    for (final Map.Entry<Byte, List<UpdateEntry<V>>> entry : smallGroups.entrySet()) {
+      final byte nibble = entry.getKey();
+      final List<UpdateEntry<V>> childUpdates = entry.getValue();
+      final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));


The same as above, create a method that concatenates based on the underlying array.

ahamlat

The proposed changes can be addressed in a separate PR as this PR is about state root calculation.
For the instance type (8 Cores / 8 threads) we used in the screenshot below, it shows up to 40% improvement in block processing time. We've seeing less improvement on VMs with less cores (4 cores / 8 threads).

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

…su-eth#9576) Signed-off-by: Karim Taam <karim.t2am@gmail.com> Co-authored-by: ahamlat <ameziane.hamlat@consensys.net>

matkt added 30 commits December 4, 2025 10:59

add state root computation optimization

b6b2b46

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

refactor stateroot opti

7fc21ec

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

remove threshold

d72a362

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

try use parallel stream

602b4b3

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

full parallel stream

f3ca34a

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

use cached thread pool

f9203c2

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

10 fixed thread pool

96d9a24

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

change parallel trie logic and condition

6bc14bb

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean code

51d7680

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

Merge branch 'main' into merkle-trie-optimisation

8f197b7

add parallel update for leaf and null node

3ff878d

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

fix logic issue for leaf and null

0b1acc4

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean build

917c12e

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean build

d9381a0

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

fix logic

032d645

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

change fork join pool and condition

d0a7026

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

remove NCPU condition

97faea3

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

set correct condition

33b0b7a

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

add parallel extension

474b810

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

change logic for root node

3e4e8b3

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

disable extension

3b3b6f8

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

fix extension

459eb45

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean extension code

a12cd91

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean extension code

8e67269

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

merge

d6cfcb8

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

move to ncpu 2

6c1c09a

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

load node in a different location

b6c091f

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

add //stream for fcu

03e1d9b

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean PR

2db94a6

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

Merge branch 'main' into merkle-trie-optimisation

3867b15

add flag for parallel state root computation

f02c367

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

matkt changed the title ~~Merkle trie optimisation~~ feat: Add parallel state root computation support for Bonsai trie Jan 7, 2026

matkt added 5 commits January 7, 2026 19:00

remove parallel stream

1421b81

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

Merge branch 'main' into merkle-trie-optimisation

e3c735d

revert change for parallel stream

58ecf61

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

use ForkJoinTask for parallel state root computation

64b74dd

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean PR

e6c150f

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

matkt marked this pull request as ready for review January 9, 2026 13:53

matkt added 2 commits January 9, 2026 18:09

fix tests

2455b66

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

clean comment

86cb7d6

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

matkt added the performance label Jan 9, 2026

Merge branch 'main' into merkle-trie-optimisation

9c39824

ahamlat requested changes Jan 16, 2026

View reviewed changes

ahamlat approved these changes Jan 21, 2026

View reviewed changes

matkt added 2 commits January 21, 2026 18:25

Address code review comments

5781c17

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

Merge branch 'main' into merkle-trie-optimisation

b06f0f0

macfarla added this to Performance Jan 22, 2026

macfarla moved this to In Progress in Performance Jan 22, 2026

matkt added 3 commits January 22, 2026 15:26

merge main

c5fb1d5

Signed-off-by: Karim Taam <karim.t2am@gmail.com>

Merge branch 'main' into merkle-trie-optimisation

00886fd

Merge branch 'main' into merkle-trie-optimisation

05ada8d

matkt enabled auto-merge (squash) January 23, 2026 06:20

matkt merged commit 2f6d7f2 into besu-eth:main Jan 23, 2026
46 checks passed

github-project-automation Bot moved this from In Progress to Done in Performance Jan 23, 2026

macfarla mentioned this pull request Feb 3, 2026

add changelog entry for state root parallelization #9733

Merged

macfarla pushed a commit to CPerezz/besu that referenced this pull request Feb 6, 2026

feat: Add parallel state root computation support for Bonsai trie (be…

8733b58

…su-eth#9576) Signed-off-by: Karim Taam <karim.t2am@gmail.com> Co-authored-by: ahamlat <ameziane.hamlat@consensys.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add parallel state root computation support for Bonsai trie#9576

feat: Add parallel state root computation support for Bonsai trie#9576
matkt merged 44 commits intobesu-eth:mainfrom
matkt:merkle-trie-optimisation

matkt commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahamlat Jan 16, 2026

Uh oh!

ahamlat Jan 16, 2026

Uh oh!

ahamlat Jan 16, 2026

Uh oh!

ahamlat Jan 16, 2026

Uh oh!

ahamlat Jan 16, 2026

Uh oh!

ahamlat Jan 16, 2026

Uh oh!

ahamlat left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));
+        final byte[] out = new byte[pathDepth + 1];
+        final byte[] in = location.toArrayUnsafe();
+        System.arraycopy(in, 0, out, 0, pathDepth);
+        out[len] = nibble;
+        final Bytes childLocation =  Bytes.wrap(out);

Conversation

matkt commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Description

Changes

Core Implementation

Configuration

Key Features

Backward Compatibility

Configuration Examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahamlat Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ahamlat Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ahamlat Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ahamlat Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ahamlat Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ahamlat Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ahamlat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

matkt commented Dec 16, 2025 •

edited

Loading

ahamlat left a comment •

edited

Loading