Skip to content

Conversation

@wo-o
Copy link

@wo-o wo-o commented Nov 22, 2025

Problem

A race condition exists where BestTransactions iterator creates an independent snapshot via BTreeMap::clone(). When the maintenance job removes transactions from the pool, existing iterators remain unaware, causing removed transactions to be included in blocks.

Production Impact

During load testing with --txpool.lifetime 600, this caused sequencer failure:

2025-11-15T13:30:01.790071Z WARN The database read transaction has been open for too long.
    open_duration=190.664088693s self.txn_id=7039115

2025-11-15T13:30:01.947723Z WARN Attempt to calculate state root for an old block
    might result in OOM target=2846869

The sequencer stopped producing blocks because:

  1. Maintenance removed expired transactions from pool
  2. Block builder's snapshot still contained the removed transactions
  3. Execution attempted to process non-existent transactions
  4. Database read stuck
  5. State root calculation failed

This is critical for transactions with time-based expiration.

Root Cause

crates/transaction-pool/src/pool/pending.rs:110 - The iterator snapshot is created via BTreeMap::clone(), which creates an independent copy. The pool has notification support for new transactions but lacks removal notifications.

Solution

Add removal notifications following the existing pattern for new transactions:

  1. PendingPool broadcasts transaction removals via new removed_transaction_notifier channel
  2. BestTransactions subscribes to removal notifications on creation
  3. Iterator processes removals before yielding next transaction

Previously, there was a race condition where:
1. Block builder creates a BestTransactions iterator (snapshot via BTreeMap::clone())
2. Maintenance job removes a transaction from the pool
3. Block builder's snapshot was independent, still containing the removed transaction

This could cause expired transactions with time-based validity to be included in
blocks after removal.

This commit adds removal notifications:
- PendingPool broadcasts transaction removals via a new channel
- BestTransactions iterator subscribes to removal notifications
- Iterator removes transactions from its snapshot when notified
- Test verifies that removed transactions are not found in snapshots

Fixes race condition for transactions with time-based expiration.
@wo-o wo-o force-pushed the fix/txpool-race-condition branch from 6574689 to 91709cf Compare November 22, 2025 04:51
@mattsse
Copy link
Collaborator

mattsse commented Nov 23, 2025

could you elaborate on why

causing removed transactions to be included in blocks.

would be problematic, because atm I believe this would only happen if the pending pool is at capacity, but I assume you need this to satisfy some protocol level rules?

Block builder's snapshot still contained the removed transactions
Execution attempted to process non-existent transactions
Database read stuck

not following the sequence of events here that's causing db/State root issues, maybe this has something to do with some protocol specific rules?
if the blockbuilder adds this tx into the block, how would this cause execution errors?

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lacking some context on why excluding removed txs is critical

but maybe this is similar to OP interop inclusion rules

impl<Cons, Pooled> MaybeInteropTransaction for OpPooledTransaction<Cons, Pooled> {
fn set_interop_deadline(&self, deadline: u64) {
self.interop.store(deadline, Ordering::Relaxed);
}
fn interop_deadline(&self) -> Option<u64> {
let interop = self.interop.load(Ordering::Relaxed);
if interop > NO_INTEROP_TX {
return Some(interop)
}
None
}

which is checked during block building for example

let interop = tx.interop_deadline();

// We skip invalid cross chain txs, they would be removed on the next block update in
// the maintenance job
if let Some(interop) = interop &&
!is_valid_interop(interop, self.config.attributes.timestamp())
{
best_txs.mark_invalid(tx.signer(), tx.nonce());
continue
}

Comment on lines 101 to +106
pub(crate) new_transaction_receiver: Option<Receiver<PendingTransaction<T>>>,
/// Used to receive transaction removals from the pool after this iterator was created.
///
/// Removed transactions are deleted from this iterator's snapshot before yielding the next
/// value, preventing inclusion of transactions that were removed by maintenance jobs.
pub(crate) removed_transaction_receiver: Option<Receiver<TransactionId>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not opposed to this, but since this is an internal channel we can unify this by introducing

enum PendingPoolEvent<T> {Added(Tx),Removed(tx)}

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants