Skip to content

Conversation

@AlexandruCihodaru
Copy link
Contributor

Add support for handling blockchain revert. This is useful in testing.

Changes:

  • Add ChainEvent::Reverted variant to represent backward blockchain progression
  • Implement handle_reverted() method that:
    • Collects transactions from retracted blocks via included_transactions cache or by fetching block bodies from the API
    • Removes all views beyond the revert point to prevent zombie views
    • Removes included transactions from mempool (they can be resubmitted later)
    • Updates enactment state (recent_finalized_block and recent_best_block)
    • Ensures a valid view exists at the revert target block
  • Add early return in maintain() for Reverted events to prevent normal forward-progression logic from running

These changes fix issues where reverting would leave zombie views in the view store, causing issues at subsequent operations.

Note: Transactions that were pending may not be visible after revert if they fail the revalidation

Add support for handling blockchain revert. This is useful in testing.

Changes:
- Add ChainEvent::Reverted variant to represent backward blockchain progression
- Implement handle_reverted() method that:
  * Collects transactions from retracted blocks via included_transactions cache
    or by fetching block bodies from the API
  * Removes all views beyond the revert point to prevent zombie views
  * Removes included transactions from mempool (they can be resubmitted later)
  * Updates enactment state (recent_finalized_block and recent_best_block)
  * Ensures a valid view exists at the revert target block
- Add early return in maintain() for Reverted events to prevent normal
  forward-progression logic from running

These changes fix issues where reverting would leave zombie views in the view store,
causing issues at subsequent operations.

Note: Transactions that were pending may not be visible after revert if they fail the
revalidation

Signed-off-by: Alexandru Cihodaru <[email protected]>
@AlexandruCihodaru
Copy link
Contributor Author

/cmd fmt

@AlexandruCihodaru
Copy link
Contributor Author

/cmd prdoc --audience runtime_dev --bump patch

@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Nov 28, 2025

DQ:

          D1-E1-F1-G1-..-X1
         /
A - B - C - D2-E2-F2-G2-..-X2
         \
          D3-E3-F3-G3-..-X3

Is it realistic scenario? Should we handle this properly? If we revert from X1 to B, shold we also remove all transactions included on D2-...-X2 and D3-...-X3 forks (as we do for D1-...-X1)?

If we assume that revert can only be called if there is single fork - should we somehow check this in handle_revert function? (or at least document it somehow).

@AlexandruCihodaru
Copy link
Contributor Author

DQ:

          D1-E1-F1-G1-..-X1
         /
A - B - C - D2-E2-F2-G2-..-X2
         \
          D3-E3-F3-G3-..-X3

Is it realistic scenario? Should we handle this properly? If we revert from X1 to B, shold we also remove all transactions included on D2-...-X2 and D3-...-X3 forks (as we do for D1-...-X1)?

If we assume that revert can only be called if there is single fork - should we somehow check this in handle_revert function? (or at least document it somehow).

Excellent question. I think that in anvil it is not possible to have such a scenario but I believe that we should delete the transactions on all possible paths.

Copy link
Contributor

@iulianbarbu iulianbarbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. It would be great to capture in the event the idea of reverting all existing forks, to have this logic appliable not just to single chain nodes like anvil-polkadot - but tbh not super sure how difficult it is.

@AlexandruCihodaru
Copy link
Contributor Author

/cmd fmt

@AlexandruCihodaru AlexandruCihodaru added the T0-node This PR/Issue is related to the topic “node”. label Dec 2, 2025
@re-gius
Copy link
Contributor

re-gius commented Jan 23, 2026

Changes applied to handle_reverted() implementation in #10867

  1. Always create a fresh view at the new head by populating it from the current mempool state. We do not rely on stale views because:
    - Old views may contain transactions from now-reverted blocks, but we want to remove those
    - Old views won't contain transactions submitted after the view was created, but we want to include those
  2. Atomic view removal with race condition prevention
    - Step 5 now holds all view locks simultaneously (following the same pattern as ViewStore::insert_new_view_sync)
    - We also abort view removal if no view exists at new head to avoid inconsistent state
  3. Updated documentation
  4. Added view state assertions to tests: fatp_revert_multiple_blocks_does_not_resubmit now verifies view exists at new head and reverted views are removed

I guess this PR is ready for review now

@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Jan 26, 2026

We hade some offline discussions with @AlexandruCihodaru regarding this PR. I am leaving here the main concerns I still have about this PR:

1. Inconsistent handling of included vs. in-pool transactions

Current implementation removes only transactions included in reverted blocks. Transactions still in ready/future queues are left untouched, even though they may have been sent at the same time.

  Example:
  Block N-1: submit tx0, tx1, tx2 (all ready, prio: tx2 > tx1 > tx0, txs are "heavy")
  Block N:   tx2 included (InBlock)
  Block N+1: tx1 included (InBlock)

  Revert to N-1

After revert:

  • tx1, tx2 -> removed (they were included in reverted blocks)
  • tx0 -> stays in pool (was never included)

This is behavior is inconsistent and hard to understand. It is impossible to control what transactions we have in pool.

I would propose to provide explicit API for removal - decouple "remove transactions" from "revert chain" (which are orthogonal operations), giving the node builders flexibility to implement their desired behavior.
The new method would be "removal without banning" so transaction can be resubmitted after reverting.

2. Missing Dropped event for watchers

When using submit_and_watch, the watcher may hang indefinitely if a transaction is silently removed after reversal:

  watcher = pool.submit_and_watch(tx);
  pool.maintain(NewBlock(N, [tx]));  
  // ...
  // watcher receives InBlock event
  // ...
  pool.maintain(Revert(N-1));        
  // tx removed, but watcher never notified => hangs forever

The event flow Ready -> InBlock -> Dropped is valid and should be emitted when transactions are removed due to revert.

3. Better documentation of behavior needed

Whatever behavior is chosen, it should be documented (perhaps in ChainEvent::Reverted docs) so users know:

  • Which transactions get removed on revert,
  • Whether they need to resubmit pending transactions,
  • What events to expect from watchers,

Possible approaches

  1. Remove only included transactions (current) - working, but incosistent, hard to control,
  2. Remove all transactions (included + ready + future) - but requires resubmission (which probably is intended),
  3. Provide explicit API for removal - decouple "remove transactions" from "revert chain", giving node builders flexibility to implement their desired behavior.

I think approach 3 is the cleanest.

@bkchr
Copy link
Member

bkchr commented Jan 26, 2026

@michalkucharczyk hadn't we discussed some time ago that it would be the simplest to just re-create the tx pool? So, not requiring any of this code here?

@michalkucharczyk
Copy link
Contributor

@michalkucharczyk hadn't we discussed some time ago that it would be the simplest to just re-create the tx pool? So, not requiring any of this code here?

Could be a solution, but it may have some limitation. E.g.

B0->B1->B2->B3
  • reverting to B2 would "kill" view for B1, so you would not be able to build a block on top of it.
  • killing the pool means you need to resubmit all transactions,

It depends on the requirements the manual-seal / anvil node has from the reverting mechanism. Honestly I am not sure how they should work, so I try to build mechanism which is flexible enough to handle different scenarios and does not pull specifics of anvil node into the generic pool.

@re-gius
Copy link
Contributor

re-gius commented Jan 27, 2026

New changes from 121a9be

Decoupling the transaction removal from chain revert handling. Now:

  1. ChainEvent::Reverted only handles view management:
    • Removes views beyond the revert point
    • Creates a fresh view at the new head
    • Does NOT touch the mempool
  2. remove_transactions() is a separate API for explicit transaction removal without banning:
    - Node builders call it when/if they want to remove specific transactions
    - Gives flexibility: remove all, remove only reverted-block txs, or keep everything

Implementation Details

  • Dependents are notified: Unlike report_invalid, we emit Dropped events for dependent transactions to prevent their watchers from hanging indefinitely.
  • Dependents hashes are not returned: Exactly like report_invalid does, we only return hashes of transactions that were also in the input list of transactions to remove.

@michalkucharczyk what do you think of this new implementation?

@bkchr
Copy link
Member

bkchr commented Jan 27, 2026

  • reverting to B2 would "kill" view for B1, so you would not be able to build a block on top of it.

  • killing the pool means you need to resubmit all transactions,

You can just rebuild B1 by sending all the transactions again. I would assume that the anvil stuff still has all the transactions. Right now this pull request is trying to add a feature that is never used for normal operations and will never be used for them.

@michalkucharczyk
Copy link
Contributor

technically you would need to “import block” (call pool’s maintain), and resubmit transactions, but this is detail.

I see your point, if we can get all the functionality and avoid new complexity in the code - I am all in :). Question if anvil node would be happy with this - I don’t know the answer…

Maybe @AlexandruCihodaru or @alindima can comment on this proposal.

@michalkucharczyk
Copy link
Contributor

Also if we want to have anvil node in our contracts (reliability) toolset, then it becomes normal-like operation, we need to support it, test, etc…

@bkchr
Copy link
Member

bkchr commented Jan 27, 2026

Also if we want to have anvil node in our contracts (reliability) toolset, then it becomes normal-like operation, we need to support it, test, etc…

By "normal operation" I meant anything that you need to run a blockchain network. This here is for testing. I'm not saying that this is not important, but if we can achieve the same results if we don't need to modify the internals of tx pool, we should not do this. Just adds more complexity that we can move closer to where it is needed (anvil node).

@alindima
Copy link
Contributor

Chain reversion is something that can happen in polkadot under "normal" protocol operation (although in the case of disputes, which are exceptional). I remember @sandreim mentioning some issues with the txpool on reversion as well (on some polkadot-based network or locally, not on an anvil-based instance).

I would assume that the anvil stuff still has all the transactions.

No, anvil uses the txpool from substrate, does not have a wrapper over it. Of course, we could have implemented our own txpool but there was no good reason for it at the time (the substrate txpool looked flexible enough for our use case). Since chain reversion is something that needs to work regardless of whether or not anvil uses it, I'd much rather solve this problem for good in here rather than adding a reimplementation to work around this

@michalkucharczyk
Copy link
Contributor

I think the point is to re-use the existing txpool in anvil-specific pool-wrapper. The inner pool could be dropped and new instance of substrate could be created as inner when reversion happens.

I don't have enough information about anvil-node scenarios to judge if this approach is feasible, and if all scenarios can be covered.

The ultimate code is to reduce complexity in existing pool.

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/21433377790
Failed job name: cargo-clippy

@bkchr
Copy link
Member

bkchr commented Jan 29, 2026

Chain reversion is something that can happen in polkadot under "normal" protocol operation (although in the case of disputes, which are exceptional). I remember @sandreim mentioning some issues with the txpool on reversion as well (on some polkadot-based network or locally, not on an anvil-based instance).

Chain reversion are not happening under normal protocol operations. We are forking, if an invalid candidate was found by approval voting. But we do not revert, especially we do not revert the finalized chain.

@sandreim
Copy link
Contributor

sandreim commented Jan 29, 2026

Chain reversion is something that can happen in polkadot under "normal" protocol operation (although in the case of disputes, which are exceptional). I remember @sandreim mentioning some issues with the txpool on reversion as well (on some polkadot-based network or locally, not on an anvil-based instance).

Chain reversion are not happening under normal protocol operations. We are forking, if an invalid candidate was found by approval voting. But we do not revert, especially we do not revert the finalized chain.

The context is some block was dropped in RC (never backed on chain).

"revert" indeed is not the right word here, forking is accurate, but the percieved effect by the user is that the chain state has been reverted up to the blockheight that we start building the fork on.

What I was expecting to happen is that the transactions that were included in the abandoned fork are included in the new one.

@bkchr
Copy link
Member

bkchr commented Jan 29, 2026

What I was expecting to happen is that the transactions that were included in the abandoned fork are included in the new one.

That is happening and if not, it is a bug :) Maybe directly try the forkaware transaction pool, but this should only be required on parachains. For the relay chain the normal tx pool should handle it correctly.

But ahh, yeah for the fork case in Polkadot we will not change the best chain until we have a longer/better chain that the old one. So, the old tx pool will not insert the transactions. If you use the forkaware tx pool it should fix this behavior.

@sandreim
Copy link
Contributor

That is happening and if not, it is a bug :) Maybe directly try the forkaware transaction pool, but this should only be required on parachains. For the relay chain the normal tx pool should handle it correctly.

But ahh, yeah for the fork case in Polkadot we will not change the best chain until we have a longer/better chain that the old one. So, the old tx pool will not insert the transactions. If you use the forkaware tx pool it should fix this behavior.

I was using the FATP and I was hitting this on every session boundary (when we clearly drop blocks).

@bkchr
Copy link
Member

bkchr commented Jan 29, 2026

I was using the FATP and I was hitting this on every session boundary (when we clearly drop blocks).

https://github.com/paritytech/polkadot-sdk/issues/new/choose and ping @michalkucharczyk :D

@re-gius
Copy link
Contributor

re-gius commented Jan 29, 2026

After reading your comments and investigating the original anvil implementation more in detail, I propose we simplify this PR to only keep what's truly fundamental for anvil revert methods, so the handle_reverted logic.
What we need is cleaning up zombie views and update most_recent_view properly - everything that's currently inside the handle_reverted method.
As for removing transactions or manipulating the mempool, we don't necessarily need it. Both because the original anvil does not restore removed txns in the mempool when reverting, and because this adds more complexity to the transaction pool.

What do you think? @bkchr @michalkucharczyk

@bkchr
Copy link
Member

bkchr commented Jan 29, 2026

What we need is cleaning up zombie views and update most_recent_view properly - everything that's currently inside the handle_reverted method.

But wouldn't this be solved by just recreating the tx pool?

@re-gius
Copy link
Contributor

re-gius commented Jan 29, 2026

But wouldn't this be solved by just recreating the tx pool?

That would be technically possible. We would need to copy all transactions from the mempool and carry them to the new pool. I can try implementing it directly in anvil-polkadot.

The remaining issue that won't be solved is that watchers from submit_and_watch may be orphaned and hang forever waiting for notifications. We may still accept this behavior for a dev/testing tool, but it's a bug.

@re-gius
Copy link
Contributor

re-gius commented Jan 29, 2026

I dove deeper into anvil-polkadot and the "recreate pool on revert" approach is quite involved, it affects several functionalities and it has a couple of bugs. These are:

  1. submit_and_watch listeners are orphaned: after the old pool is dropped, clients waiting for transaction updates will hang or timeout.
  2. Lost pool metadata on resubmit: when we replay transaction on the new pool to mimic Anvil, we only have raw extrinsic bytes. So we lose TransactionSource (all become Local), watch status, and the older internal pool status. - this is probably unexpected for anvil users (?)

Moreover, all stream subscribers require explicit refresh/reset: mining engine needs to be refreshed, pending transaction filters need to be recreated... This is not a bug, but more a tedious operation.

After all that, I still believe that supporting some basic revert functionalities in the Substrate Transaction Pool remains useful for Anvil and for whoever needs a bug-free revert on a Substrate chain.
EDIT: remove_transactions logic however is not necessary, it's just a nice-to-have to allow flexible transaction removal policies on revert

@michalkucharczyk
Copy link
Contributor

hm, you should also intercept submit/submit and watch in wrapper. you also can intercept the listener? so orphaned listener should not be a problem.

I don’t think transaction source is a problem. You want to resubmit transactions (to new inner pool) previously submitted to wrapper, right?

@re-gius
Copy link
Contributor

re-gius commented Jan 30, 2026

hm, you should also intercept submit/submit and watch in wrapper. you also can intercept the listener? so orphaned listener should not be a problem.

I don’t think transaction source is a problem. You want to resubmit transactions (to new inner pool) previously submitted to wrapper, right?

You're right. I can intercept all submissions and store the metadata in the wrapper, I can also edit all relevant streams (like pending transaction filters and the mining engine too) to check for pool recreations at each poll.

So, in the end, it's technically possible to build a bug-free tx pool wrapper to support reverts, but it's more error-prone and significantly more complex than allowing a basic revert in the Substrate pool. If you think that revert complexity doesn’t justify changing Substrate, I’m fine limiting the changes to Anvil. Please let me know which direction you prefer.

@bkchr
Copy link
Member

bkchr commented Jan 30, 2026

but it's more error-prone and significantly more complex than allowing a basic revert in the Substrate pool. If you think that revert complexity doesn’t justify changing Substrate, I’m fine limiting the changes to Anvil. Please let me know which direction you prefer.

Let's try to implement it and compare it. I don't see why this should be complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants