feat: Add parallel state root computation support for Bonsai trie#9576
feat: Add parallel state root computation support for Bonsai trie#9576matkt merged 44 commits intobesu-eth:mainfrom
Conversation
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
| this.root = loadNode(root); | ||
|
|
||
| // Convert pending updates to UpdateEntry objects with nibble paths | ||
| final List<UpdateEntry<V>> entries = |
There was a problem hiding this comment.
I suggest to use simple for loops to avoid steams overhead on memory allocations and latency.
There was a problem hiding this comment.
Check this branch that has the change that replaces streams with simple for loops.
| for (final Map.Entry<Byte, List<UpdateEntry<V>>> entry : largeGroups.entrySet()) { | ||
| final byte nibble = entry.getKey(); | ||
| final List<UpdateEntry<V>> childUpdates = entry.getValue(); | ||
| final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble)); |
There was a problem hiding this comment.
| final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble)); | |
| final byte[] out = new byte[pathDepth + 1]; | |
| final byte[] in = location.toArrayUnsafe(); | |
| System.arraycopy(in, 0, out, 0, pathDepth); | |
| out[len] = nibble; | |
| final Bytes childLocation = Bytes.wrap(out); |
There was a problem hiding this comment.
This the JMH benchmark that shows the different in performance. arraycopy_newArray_wrap is the new suggested implementation
Benchmark (locationSize) Mode Cnt Score Error Units
BytesConcatenateBenchmark.arraycopy_newArray_wrap 8 avgt 16 6.313 ± 0.321 ns/op
BytesConcatenateBenchmark.arraycopy_newArray_wrap 16 avgt 16 6.150 ± 0.140 ns/op
BytesConcatenateBenchmark.arraycopy_newArray_wrap 32 avgt 16 6.383 ± 0.291 ns/op
BytesConcatenateBenchmark.concat_bytesOf 8 avgt 16 37.773 ± 6.029 ns/op
BytesConcatenateBenchmark.concat_bytesOf 16 avgt 16 35.169 ± 2.331 ns/op
BytesConcatenateBenchmark.concat_bytesOf 32 avgt 16 36.688 ± 1.568 ns/op
| for (final Map.Entry<Byte, List<UpdateEntry<V>>> entry : smallGroups.entrySet()) { | ||
| final byte nibble = entry.getKey(); | ||
| final List<UpdateEntry<V>> childUpdates = entry.getValue(); | ||
| final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble)); |
There was a problem hiding this comment.
The same as above, create a method that concatenates based on the underlying array.
There was a problem hiding this comment.
The proposed changes can be addressed in a separate PR as this PR is about state root calculation.
For the instance type (8 Cores / 8 threads) we used in the screenshot below, it shows up to 40% improvement in block processing time. We've seeing less improvement on VMs with less cores (4 cores / 8 threads).

Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
…su-eth#9576) Signed-off-by: Karim Taam <karim.t2am@gmail.com> Co-authored-by: ahamlat <ameziane.hamlat@consensys.net>
PR description
Description
This PR introduces parallel processing capabilities for Merkle Patricia Trie operations during state root computation in Bonsai storage format, significantly improving block validation performance.
Changes
Core Implementation
ParallelStoredMerklePatriciaTrie: New parallel implementation ofStoredMerklePatriciaTrieForkJoinPoolConfiguration
WorldStateConfig: AddedisParallelStateRootComputationEnabledflag (default:true)PathBasedExtraStorageConfiguration: Added parallel state root computation configuration--bonsai-parallel-state-root-computation-enabledto enable/disable featureKey Features
Backward Compatibility
StoredMerklePatriciaTriewhen disabledConfiguration Examples