Skip to content

chore: merge develop-v2.2-new into develop#192

Open
chee-chyuan wants to merge 1417 commits into
developfrom
develop-v2.2-new
Open

chore: merge develop-v2.2-new into develop#192
chee-chyuan wants to merge 1417 commits into
developfrom
develop-v2.2-new

Conversation

@chee-chyuan
Copy link
Copy Markdown

@chee-chyuan chee-chyuan commented May 26, 2026

Summary

Merges develop-v2.2-new into develop, bringing in the v2.2 upgrade based on upstream reth v2.2.0. This PR pulls in three major upstream releases plus BSC-specific patches developed during the upgrade work.

Upstream release notes:

  • reth v2.0.0 — Storage v2 default, engine backpressure, SparseTrieCacheTask, Proof V2
  • reth v2.1.0db migrate-v2, revm 38, alloy 2.0, RocksDB sync fix
  • reth v2.2.0 — EIP-7928 BAL scaffolding, Discv5 default-on, snap/2

Major upstream changes by category

Storage v2 (v2.0 default, v2.1 migrator)

  • Storage v2 is the default for new nodes (upstream #22890). Changesets and receipts go to static files, history indices to RocksDB.
  • reth db migrate-v2 for one-shot v1 → v2 migration without full resync (#23422, pruned-node fix #23716).
  • Slot preimage DB for plain changeset keys in v2 (#22379).

Engine backpressure (v2.0)

  • New --engine.persistence-backpressure-threshold flag (default 16) that stalls new payload processing when the canonical-to-persisted gap reaches this threshold (#23280, #23308). Validated at startup to be strictly greater than --engine.persistence-threshold.
  • ⚠️ Interacts with --engine.memory-block-buffer-target. On chains with sub-second block intervals, configurations where steady-state gap (≈ memory_block_buffer_target) is ≥ backpressure threshold will continuously stall the engine. See Test focus / Known risks below.

Performance (v2.0)

  • SparseTrieCacheTask — background state-root task that persists the sparse trie across blocks (#21583).
  • ArenaParallelSparseTrie (#22381).
  • Proof V2 — ground-up rewrite of merkle proof calculation; supports partial proofs and cursor reuse (#19863, #20336).
  • Shared execution cache between engine and payload builder (#23242, #23246).
  • jemalloc enabled by default on supported platforms (#23214).
  • Rayon par_iter for tx prewarming with dedicated pool (#22521, #22108).
  • Thread priority boosting for engine and sparse trie threads (#22541).
  • Read-only MDBX tx pooling (#22631).
  • Adaptive multiproof chunk size based on block gas usage (#22233).

Dependency upgrades

  • revm 38.0.0 (was 36) — EIP-8037 state gas support (#23191).
  • alloy 2.0.4 (was 1.8) (#23407, #23828).
  • alloy-evm 0.34.0 — gas-refund accounting fix for custom precompiles (per v2.1 release note: behaviour changes if any precompile previously set gas_refunded non-zero).
  • revm-inspectors 0.39.0.

RocksDB / Storage (v2.1)

  • RocksDB WriteOptions::sync=true by default to prevent corruption on ungraceful shutdown (#23603). Directly affects BSC triedb path.

Networking (v2.2)

  • Discv5 enabled by default (#23686). ⚠️ Behaviour change for anyone previously relying on Discv4-only.
  • Discv5 and Discv4 can bind to the same port (#23613).
  • Memory-bounded channel between network and transaction manager (#23802).
  • Bounded P2P message memory footprint (#23718).
  • snap/2 wire helpers (#23611).
  • Discv5 kbuckets background task uses Weak reference to release port on shutdown (#23282).

RPC additions

  • debug_intermediateRoots (#22754)
  • debug_traceBadBlock (#22719)
  • eth_getStorageValues batch fetcher (#22186)
  • reth_getBlockExecutionOutcome / reth_forkchoiceUpdated / reth_newPayload (#22397, #22536, #22533)
  • transactionReceipts subscription in eth_subscribe (v2.1, #23485)
  • admin_nodeInfo.id returns keccak256 node ID matching go-ethereum format (#23319)

Bug fixes worth noting

  • Engine: don't deadlock on repeated payloads (#22971)
  • Engine: wait for persistence to complete in reth_newPayload (#22239)
  • MDBX: replaced deprecated MDBX_NOTLS with MDBX_NOSTICKYTHREADS (#23378)
  • Disable fee charge in eth_createAccessList and eth_estimateGas (#23026, #22959)
  • Clamp pending finalized/safe block to persisted height (#22783)

Breaking SDK changes (selected, full list in release notes)

  • reth-primitives crate removed, replaced by reth-ethereum-primitives (#23220)
  • SerdeBincodeCompat removed; ExEx WAL now uses RLP (#23158)
  • PayloadBuilderAttributes removed (#23202)
  • TaskSpawner trait removed in favor of concrete Runtime (#22052)
  • reth-engine-service crate removed (#22187)
  • reth-primitives-traits moved to separate reth-core repo (#23186, #23210)

BSC-specific additions in this PR

  • BSC-specific engine fixes: MDBX read tx deferral, changeset task improvements
  • perf(metrics): prevent multi-second engine stalls from scrape hooks via spawn_blocking
  • Revert temporary debug logging added during block import investigation
  • Fix compilation errors and clippy warnings from v2.2.0 merge
  • Fix duplicate ParliaSnapshotBlob and overlay_cache references

Test focus

High priority — likely BSC-affected behaviour

  1. Storage v2 fresh node + migrate-v2 from existing v1 datadir — both paths must produce identical canonical chain. See reth db migrate-v2 docs. Note that migrate-v2 resets SenderRecovery / TransactionLookup / IndexAccountHistory / IndexStorageHistory / MerkleExecute / MerkleUnwind checkpoints to 0 by design; pipeline rebuilds these locally from preserved Headers/Bodies in static files.
  2. Engine backpressure interaction — verify --engine.persistence-backpressure-threshold > --engine.persistence-threshold ≥ --engine.memory-block-buffer-target in deploy scripts, especially on fast block-time testnets. Steady-state persistence_gap should stay below backpressure threshold or engine will stall on every block.
  3. revm 38 / alloy-evm 0.34 EVM regression — replay a window of recent mainnet blocks; compare receipts and state-roots against geth.
  4. RocksDB sync=true — measure triedb persist latency, ensure no regression in block import / payload build time.
  5. Discv5 default-on — confirm peer discovery in BSC test net topologies that previously relied on Discv4.

Medium priority

  1. SparseTrieCacheTask and Proof V2 effects on state-root timings during high TPS bursts.
  2. reth db migrate-v2 on a pruned datadir (#23716).
  3. New RPC endpoints (reth_*, debug_intermediateRoots, transactionReceipts subscription).
  4. MDBX NOSTICKYTHREADS semantics change — verify long-lived RPC read transactions still behave correctly.

Sanity checks

  1. make build, make maxperf, cargo clippy --workspace --tests --all-features, ef-tests.

Known risks / compatibility

  • Storage v1 ↔ v2 not interchangeable: a node started with v2 cannot reuse a v1 datadir without running db migrate-v2.
  • Downgrade target is v2.1.0, not earlier (per upstream v2.2 release note).
  • Long out-of-process MDBX read transactions concurrent with reorgs can stall persistence under v2 + engine backpressure (per upstream v2.0 note).
  • ExEx WAL encoding changed from bincode to RLP; any persisted WAL from earlier versions must be migrated.
  • alloy-evm gas-refund fix in v2.1 may change behaviour for custom precompiles that previously set gas_refunded non-zero.

Rimeeeeee and others added 30 commits April 13, 2026 14:47
Co-authored-by: Soubhik Singha Mahapatra <soubhiksmp2004@gmail.com>
…(#23485)

Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>
Co-authored-by: Ishika Choudhury <117741714+Rimeeeeee@users.noreply.github.com>
Co-authored-by: Alexey Shekhirin <5773434+shekhirin@users.noreply.github.com>
Co-authored-by: Alexey Shekhirin <github@shekhirin.com>
Co-authored-by: Soubhik Singha Mahapatra <soubhiksmp2004@gmail.com>
Co-authored-by: Soubhik Singha Mahapatra <160333583+Soubhik-10@users.noreply.github.com>
Co-authored-by: Brian Picciano <933154+mediocregopher@users.noreply.github.com>
Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: steven <corderosteven6@gmail.com>
Co-authored-by: Derek Cofausper <256792747+decofe@users.noreply.github.com>
Co-authored-by: Brian Picciano <933154+mediocregopher@users.noreply.github.com>
…ks (#23494)

Co-authored-by: mediocregopher <mediocregopher@users.noreply.github.com>
Co-authored-by: Amp <amp@ampcode.com>
… (#23356)

Co-authored-by: Derek Cofausper <256792747+decofe@users.noreply.github.com>
Co-authored-by: Emma Jamieson-Hoare <21029500+emmajam@users.noreply.github.com>
…ror/never) (#23501)

Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: Soubhik Singha Mahapatra <soubhiksmp2004@gmail.com>
Co-authored-by: Soubhik Singha Mahapatra <160333583+Soubhik-10@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Alexey Shekhirin <5773434+shekhirin@users.noreply.github.com>
Co-authored-by: Federico Gimenez <federico.gimenez@gmail.com>
Co-authored-by: Federico Gimenez <fgimenez@users.noreply.github.com>
Co-authored-by: klkvr <klkvrr@gmail.com>
Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chee-chyuan and others added 16 commits May 18, 2026 15:32
…launch

Add logs to aid diagnosing block import issues:
- engine::api: trace when NewPayload is sent via ConsensusEngineHandle
- engine::tree: trace when engine tree receives NewPayload
- engine: debug when engine orchestrator is being built
- reth::node: info when engine node launch begins

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Log persistence_gap, threshold, and stall duration at warn level so
backpressure is visible without enabling debug/trace on the engine target.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gh engine

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds warn-level logs at every stage after execute_block completes
to identify which step is blocking on_new_payload:
- After receipt_root wait
- Inside validate_post_execution (start, post-receipt-check, pre/post hashed_state.get)
- Before state root computation (prints strategy)
- Inside await_state_root_with_timeout (pre-wait, no-timeout path)
- Inside serial-root task (start, pre/post state_root_with_updates)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`receipt_tx` was dropped immediately after `execute_transactions`
returned, before `executor.finish()` ran. Any receipts appended during
finalization (e.g. system transaction receipts on BSC) were never sent
to the background receipt-root task, causing it to see fewer receipts
than expected and silently fail to compute the root.

Fix by carrying `last_sent_len` out of `execute_transactions`, running
`finish()` first, then flushing any remaining receipts before dropping
the sender — matching the pattern in the upstream develop branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…root hang

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…triedb state root

Pinpoint whether hang is at terminate_caching, triedb_preftch_result,
or to_triedb_hashed_post_state before the blocking state root call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The prefetch task can take minutes for blocks with large state deltas
(e.g. BSC block 1 genesis deployment). Previously this blocked the
engine tree thread indefinitely via recv(). Now uses recv_timeout(5s)
so the state root computation proceeds without prefetch state on slow
prefetchers, allowing the chain to make progress.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…task

The changeset background task previously held an MDBX read-only transaction
open for its entire duration, including the slow wait_cloned() trie computation
phase. This blocked MDBX GC during write commits, causing periodic ~16-second
stalls that prevented the validator from producing in-turn blocks on BSC.

Pre-warm the overlay_cache on the engine loop thread (preserving the existing
changeset-cache eviction race fix), drop the provider immediately, then open a
fresh short-lived read transaction inside the task just before compute_trie_changesets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… changes

Reverts commits 1486308 through 5aa3129:
- Remove debug/warn logs added for engine message flow tracing
- Remove tracing dependency added to reth-engine-primitives
- Remove backpressure warn logs
- Remove post-execution and state root hang warn logs
- Revert receipt-root task flush added for post-finalization receipts
- Remove timeout on triedb prefetch result wait

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…189)

* metrics: gate heavy scrape hooks behind RETH_DISABLE_HEAVY_METRICS

The two custom hooks registered by `metrics_hooks` walk MDBX page
metadata + the freelist DB and iterate every static-file jar header on
every Prometheus scrape. On large databases this is expensive enough to
stall the metrics endpoint and starve the runtime.

Skip registering both hooks when `RETH_DISABLE_HEAVY_METRICS` is set in
the environment; the rest of the registry (process, jemalloc, io, chain
spec, version) is unaffected and the endpoint still responds normally.
The env var is documented on the function so the escape hatch is
discoverable without grepping the source.

* metrics: run scrape handler and push-gateway render on spawn_blocking

The Prometheus metrics handler is fundamentally synchronous: it
invokes every registered hook and then runs the prometheus exporter's
`render()`, all on the tokio worker that accepted the HTTP request
(or on the runtime worker driving the push-gateway loop).

The default hooks are cheap (procfs, jemalloc atomic reads), but the
two `report_metrics` hooks (DB stat walk, static-file jar enumeration)
can take seconds on large archives. Even with those gated out (see
preceding patch), `render()` itself is O(total time-series) and will
grow over time. A multi-millisecond synchronous block on a runtime
worker is not ideal and can become a real engine latency source if
hook cost ever regresses.

Move the synchronous work off the runtime worker:

- Endpoint handler now offloads `handle_request` (which calls the hook
  + render or the pprof dump) to `spawn_blocking`. On join error,
  return a 500 instead of letting the connection task panic.
- Push-gateway loop offloads the hook + render to `spawn_blocking`;
  on join error, log a warning and skip this tick rather than killing
  the loop. The HTTP put itself was already async so it stays inline.

No behavioral change to what the endpoint or push-gateway returns;
only the thread on which the rendering happens.
@chee-chyuan chee-chyuan requested a review from joey0612 as a code owner May 26, 2026 06:09
chee-chyuan and others added 10 commits May 26, 2026 14:13
# Conflicts:
#	crates/node/builder/src/launch/common.rs
…int config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion, and lint

Remove Optimism jobs and update workflow files to match upstream
paradigmxyz/reth v2.2.0: depot runner conditionals, SHA-pinned
actions, ethereum-only matrices, and MSRV bumped to 1.93.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add liburing-dev, pkg-config, and libclang-dev install steps to jobs
that build with BSC features (rocksdb, io-uring). Upstream paradigmxyz/reth
omits these since it does not use rocksdb or io-uring.

Also add rustfmt component to nightly toolchain steps where librocksdb-sys
build script requires it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove op-reth from the matrix; upstream already dropped it in v2.2.0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Restore `if: github.event_name == 'merge_group'` so stage-run-test
  only runs in merge groups, not on every PR
- Remove account-hashing, storage-hashing, and hashing stages: in V2
  storage mode (default), these stages unwind the hashed state written
  by execution but their no-op execute never restores it, causing merkle
  to produce the genesis state root

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverts 2778a06

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chee-chyuan added a commit to bnb-chain/reth-bsc that referenced this pull request May 28, 2026
Switches all reth-* dependencies from branch = "develop-v2.2-new" to
rev = "485e37b738754f96603391f42ee529f9364b47a4" for deterministic builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chee-chyuan added a commit to bnb-chain/reth-bsc that referenced this pull request May 28, 2026
Switches all reth-* dependencies from rev de11c921 to
rev = "485e37b738754f96603391f42ee529f9364b47a4" for deterministic builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.