Skip to content

Conversation

@yongkangc
Copy link
Member

@yongkangc yongkangc commented Nov 21, 2025

Addresses #19250 by moving both trie sorting and trie input construction out of the hot path; validation can return once the state root is checked.

Changes:

  • We stop building trie input on the validation hot path. validate_block_with_state now returns an ExecutedBlock with a pending DeferredTrieData, and offloads the sorting and trie input building to spawn_blocking.
  • That background task then reconstructs the ancestor overlay (using in-memory blocks), sorts the state and trie updates, builds the trie input, and stores it in the handle. Callers only wait when they actually need the sorted trie data.

Approach

  • We use Arc<OnceLock<ComputedTrieData>> to share the single result with multiple readers. OnceLock::wait blocks synchronously, needs no Tokio runtime, and allows cheap cloning of the handle.
OnceLock
--------
executor thread                background thread
--------------                  -----------------
wait_cloned()                   spawn_blocking()
  | wait() on OnceLock            | compute bundle
  | clone result                  | set_ready(bundle)
  v                               v
all readers see same value      first-set wins; no error state

Flow now:

  • Hot path: execute block → verify state root → get unsorted state/updates + parent hash → return with pending handle.
  • Background task: rebuild ancestor trie input → sort state/updates → extend trie input → set_ready / set_error.
  • Consumers: call trie_data()/hashed_state()/trie_input() and block only when needed.

Impact


Other Approaches Considered?

Why not oneshot?

  • We need “set once, read many, cache for late readers” and sometimes block in sync contexts - e.g., DB writes or state providers) that need the trie data but don’t run on a Tokio runtime. A oneshot is single-producer/single-consumer, gets consumed on first recv, and is async-only. It would still require a cache and sync wait path, adding complexity here.

Why not Mutex + Condvar?

  • Using Mutex + Condvar requires some scaffolding code to handle errors and conditions. With Onelock, we could simplify the handling with blocking wait.

@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Nov 21, 2025
@yongkangc yongkangc changed the title Defer trie sorting/input assembly off the validation hot path perf(trie): compute trie async Nov 21, 2025
@yongkangc yongkangc self-assigned this Nov 21, 2025
@yongkangc yongkangc moved this from Backlog to In Progress in Reth Tracker Nov 21, 2025
add reth-trie-common dependency and update usages in flashblocks and rpc-eth-api
@yongkangc yongkangc changed the title perf(trie): compute trie async perf(trie): compute and sort trie inputs async Nov 21, 2025
@yongkangc yongkangc added C-perf A change motivated by improving speed, memory usage or disk footprint A-trie Related to Merkle Patricia Trie implementation labels Nov 21, 2025
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seconding all of @mediocregopher comments

but functionally this approach lgtm

… and trie_input, add new constructors and methods for better handling
// providers, proofs) block on the handle only when they actually need the sorted
// trie data.
let task = move || {
let result = panic::catch_unwind(AssertUnwindSafe(|| {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a panic here - the concern is that if something crashes the process, any panic within the blocking tasks leaves the underlying data unset which would cause the consumers of trie data to block forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also side note - allows us to see what fails in e2e

@yongkangc yongkangc marked this pull request as ready for review November 24, 2025 09:52
@yongkangc
Copy link
Member Author

yongkangc commented Nov 24, 2025

image
Performance Changes:
-     NewPayload Latency per-block mean change:   -5.57%
-     NewPayload Latency per-block median change: -5.24%
-     Total newPayload time change:               -7.29%
-     NewPayload Latency p50:                     -7.22%
-     NewPayload Latency p90:                     -24.79%
-     NewPayload Latency p99:                     -2.81%
-     Gas/Second:                                 +1.75%
-     Blocks/Second:                              +1.75%

@yongkangc yongkangc moved this from In Progress to In Review in Reth Tracker Nov 24, 2025
Copy link
Collaborator

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly nits, looking good!

@github-project-automation github-project-automation bot moved this from In Review to In Progress in Reth Tracker Nov 24, 2025
…plementation and updating ComputedTrieData usage
Copy link
Collaborator

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit, LGTM 🚀

@jenpaff jenpaff linked an issue Nov 24, 2025 that may be closed by this pull request
@emhane
Copy link
Collaborator

emhane commented Nov 24, 2025

@dhyaniarun1993 @itschaindev @sadiq1971 @meyer9 for visibility

Comment on lines +97 to +99
let data = OnceLock::new();
data.set(bundle).unwrap(); // Safe: newly created OnceLock
Self(Arc::new(data))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just Self(Arc::new(bundle.into()))

Comment on lines +127 to +131
/// Multiple threads can wait concurrently. All waiters wake when the value is set,
/// and each receives a cloned `ComputedTrieData` (Arc clones are cheap).
pub fn wait_cloned(&self) -> ComputedTrieData {
self.0.wait().clone()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this assumes that there's always something that's already computing the triedata,

imo this could be a potential footgun in case this isnt the case?

Comment on lines +801 to +805
/// This blocks the calling thread until the background trie computation completes.
#[inline]
pub fn trie_data(&self) -> ComputedTrieData {
self.trie_data.wait_cloned()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes that there is something that is doing this, otherwise this will deadlock, right?

Comment on lines +532 to +543
// Capture parent hash and ancestor overlays for deferred trie input construction.
let (parent_hash, overlay_blocks) = ctx
.state()
.tree_state
.blocks_by_hash(block.parent_hash())
.unwrap_or_else(|| (block.parent_hash(), Vec::new()));

// Create a deferred handle to store the sorted trie data.
let deferred_trie_data = DeferredTrieData::pending();
let deferred_handle_task = deferred_trie_data.clone();
let deferred_compute_duration =
self.metrics.block_validation.deferred_trie_compute_duration.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca we make all of this a dedicated function, this existing fn is already quite large

/// Multiple threads can wait concurrently. All waiters wake when the value is set,
/// and each receives a cloned `ComputedTrieData` (Arc clones are cheap).
pub fn wait_cloned(&self) -> ComputedTrieData {
self.0.wait().clone()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have access to the internal oncestate and .get() does not block any ongoing initialization.

we could solve this with our own state tracking but this feels a bit redundant.

imo we can take the Hit here and always do get_or_init

imo that'd be better than relying on something else to init it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-trie Related to Merkle Patricia Trie implementation C-perf A change motivated by improving speed, memory usage or disk footprint

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Compute trie input asynchronously

5 participants