[parallel] add Strategy::run_async for offloading CPU work from worker thread#4005
[parallel] add Strategy::run_async for offloading CPU work from worker thread#4005roberto-bayardo wants to merge 1 commit into
Conversation
… tasks Runs a closure to completion and returns its result through a future. The default implementation (used by Sequential) runs the work inline at first poll; Rayon spawns it onto its thread pool and suspends the caller via a oneshot channel, keeping CPU-intensive work off async executor threads. Strategy methods invoked within the closure execute with the strategy's parallelism. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
commonware-mcp | a7268e1 | Jun 09 2026, 11:24 PM |
Benchmark resultsTip ✅ PASSED: No benchmark exceeded the regression threshold. Benchmark comparison table
Baseline commit(s): |
| R: Send + 'static, | ||
| { | ||
| let (tx, rx) = futures::channel::oneshot::channel(); | ||
| self.thread_pool.spawn(move || { |
There was a problem hiding this comment.
does this take away from rayon's throughput?
There was a problem hiding this comment.
not according to Fable
for the parallel work itself, throughput is identical. The closure's host thread isn't an overhead thread sitting on top of the workers; it is one of the N
workers, and when the inner map_collect_vec runs, it's hashing alongside the rest of the pool. The pool delivers the same N-way parallelism it did under the old blocking
install — rayon's install-from-within-the-pool path exists precisely so nested parallelism doesn't lose a thread.
The only costs that exist at all:
- A one-time handoff per run_async call (oneshot send + waker) — microseconds, amortized over milliseconds of hashing.
- Sequential code inside a closure holds one worker while it runs. But that work was going to consume one thread somewhere regardless — it's just a rayon thread now
instead of a tokio thread, which is the whole point.
And under concurrency it's arguably a throughput improvement in aggregate: previously, several tasks merkleizing at once meant several tokio workers all blocked in
install, contending for the same pool while contributing nothing. Now those tasks suspend, the closures queue in rayon's injector, and the pool chews through them at full
utilization with no parked threads anywhere.
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## main #4005 +/- ##
=======================================
Coverage 95.08% 95.08%
=======================================
Files 531 530 -1
Lines 218260 218264 +4
Branches 5302 5302
=======================================
+ Hits 207524 207537 +13
+ Misses 8936 8925 -11
- Partials 1800 1802 +2
... and 34 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
Offload batch leaf hashing onto the strategy
The authenticated journal's
add_manyhashes every appended item into a leaf digest (roughly half of merkleization CPU for a qmdb batch). It was already parallelized, butRayon::installblocks the calling thread, pinning a tokio worker for tens of milliseconds on large batches. It now runs throughStrategy::run_async: still parallel,but the calling task suspends instead of blocking its thread. First production use of
run_async, to evaluate on a busy validator before converting parent-node hashing(blocked on off-lock merkleization).
API changes (ALPHA)
add_manyand the keyless/immutable batchmerkleizefns are now async.ValueEncoding/FixedValue/VariableValuegain'staticbounds so owned items can cross the offload boundary (Keyalready had one).Rayonwith the deterministic runner moved to the tokio runner: the deterministic runtime cannot observe wakeups from external pool threads.Validation
Full storage/glue/sync suites pass; conformance roots are unchanged, confirming the relocation is behavior-preserving. The caller still awaits the root, so merkleize
benches won't move — the win is co-scheduled tasks no longer stalling behind hashing on executor threads, visible in validator tail latency rather than throughput.