Skip to content

feat(op-reth): implement block builder failsafe and interop filter integration#20001

Open
wwared wants to merge 11 commits intodevelopfrom
feat/op-reth-interop-txpool-filter
Open

feat(op-reth): implement block builder failsafe and interop filter integration#20001
wwared wants to merge 11 commits intodevelopfrom
feat/op-reth-interop-txpool-filter

Conversation

@wwared
Copy link
Copy Markdown
Contributor

@wwared wwared commented Apr 9, 2026

Summary

Implements the block builder failsafe (item 5 from #19411) and integrates op-reth with the op-interop-filter for end-to-end acceptance testing.

  • Failsafe polling: Background task polls admin_getFailsafeEnabled every 1s. On transition to enabled, immediately evicts all interop txs from the pool. Matches op-geth's startBackgroundInteropFailsafeDetection.
  • Failsafe ingress rejection: Cached AtomicBool on SupervisorClient provides a fast-path reject at ingress without an RPC round-trip.
  • Belt-and-suspenders eviction: The existing block-event maintenance loop also checks failsafe state, catching any tx that raced past ingress or was added between poll ticks.
  • Validity window alignment: TRANSACTION_VALIDITY_WINDOW_SECS changed from 3600 to 86400 to match op-geth's ingressFilterTxValidityWindow. The 600s revalidation window in the maintenance loop remains the real control.
  • Devstack integration: In-process op-interop-filter with TCP proxy pattern to solve the EL↔filter circular startup dependency. The new WithInteropFilter() option on TwoL2SupernodeInterop presets enables devstack sysgo tests to run both op-reth and the op-interop-filter together inside acceptance tests, with direct access to toggle failsafe state without admin RPC/JWT.
  • Acceptance tests: 4 new tests covering valid ingress, invalid ingress rejection, full failsafe lifecycle (enable → block → disable → recover), and non-interop tx unaffected during failsafe.

Issue progress

# Item Status
1 Transaction identification Already on develop
2 Ingress filter Already on develop; this PR aligns validity window to 86400s
3 Periodic re-validation Already on develop
4 Reorg eviction (#19567) Already on develop
5 Block builder failsafe This PR

Test plan

  • TestInteropFilter_IngressAcceptsValid — valid cross-chain tx passes through filter and reaches cross-safe
  • TestInteropFilter_IngressRejectsInvalid — fabricated access list entry is explicitly rejected (not timeout)
  • TestInteropFilter_FailsafeLifecycle — tx succeeds → enable failsafe → tx rejected → disable failsafe → tx recovers
  • TestInteropFilter_NonInteropUnaffected — regular transfers succeed on both chains during failsafe
  • just lint and just fmt-fix pass in rust/
  • go vet and go fmt pass for changed Go packages

Closes #19411

wwared and others added 9 commits April 9, 2026 20:05
Change TRANSACTION_VALIDITY_WINDOW_SECS from 3600 (1 hour) to 86400
(24 hours) to match op-geth's ingressFilterTxValidityWindow spec.

This constant is used for both the interop deadline set on validated
transactions and the timeout passed to supervisor_checkAccessList.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The revalidation window (600s) is intentionally shorter than the ingress
timeout (86400s). Phase 3 evaluation confirmed the block-event-driven
revalidation loop meets all 6 evaluation criteria:
- Fires on every canonical block commit (~2s on OP chains)
- Covers all pooled interop txs with deadlines
- Evicts expired txs and revalidates stale ones
- Failsafe polling (Phase 4) covers the block-gap edge case

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add background failsafe detection to the interop tx pool:

- Add AtomicBool failsafe_enabled to SupervisorClientInner, shared via
  Arc across all clones
- Add query_failsafe() method that calls admin_getFailsafeEnabled RPC
  and caches the result
- Add is_failsafe_enabled() accessor for the cached state
- Add poll_failsafe() background task that polls every 1s and evicts
  all interop txs from the pool on failsafe transition
- Wire failsafe polling task in node.rs alongside the existing
  maintenance task, using ref+clone pattern to share the supervisor
  client
- Narrow SupervisorClientInner visibility to pub(crate) — no external
  references exist
- Remove Clone derive from SupervisorClientInner (AtomicBool is !Clone,
  inner is always behind Arc)

The ingress filter already rejects new interop txs during failsafe via
CheckAccessList at the source of truth. The polling task provides the
eviction mechanism for already-pooled txs. No changes to the
maintenance loop or payload builder are needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add in-process op-interop-filter service to devstack for supernode
interop presets:

- Refactor startMixedOpRethNode into build+start pattern to allow
  injecting --rollup.supervisor-http before starting
- Add startSupernodeELWithSupervisorURL for both op-geth and op-reth
- Create InteropFilter wrapper (sysgo/interop_filter.go) running the
  filter service in-process with direct Go struct access
- Add proxy pattern in supernode runtime to break circular dependency
  between EL nodes and filter service
- Add WithInteropFilter() preset option (supernode presets only)
- Add UseInteropFilter to PresetConfig
- Expose InteropFilter on MultiChainRuntime and TwoL2SupernodeInterop
  for direct test access (SetFailsafeEnabled, Ready, FailsafeEnabled)
- Add Ready(), SetFailsafeEnabled(), FailsafeEnabled() pass-through
  methods to filter.Service

No changes to supervisor-based presets or multichain_supervisor_runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add acceptance tests exercising the op-reth + op-interop-filter
topology with the supernode interop preset:

- TestInteropFilter_IngressAcceptsValid: valid interop tx with correct
  cross-chain references passes through the filter
- TestInteropFilter_IngressRejectsInvalid: fabricated CrossL2Inbox
  access list entries are rejected by the filter
- TestInteropFilter_FailsafeBlocksInterop: failsafe toggle blocks new
  interop txs, recovery works after disabling
- TestInteropFilter_NonInteropUnaffected: regular transfers succeed
  regardless of failsafe state
- TestInteropFilter_FailsafeEvictsPooled: failsafe transition evicts
  existing interop txs and rejects new ones

All tests use presets.WithInteropFilter() and direct
InteropFilter.SetFailsafeEnabled() for in-process failsafe control.
Tests are skipped by default (standard interop test pattern).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ernode starts

The interop filter's chain ingesters need blocks to backfill, but the
supernode (which drives block production) starts after the filter.
Split the readiness wait out of startInteropFilter and call
WaitForReady after the supernode and batchers are running.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 5 tests pass with RUST_JIT_BUILD=1 DEVSTACK_L2EL_KIND=op-reth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convention fixes:
- Sort imports alphabetically in multichain_supernode_runtime.go
  (rollup with op-node group, tcpproxy after sources)
- Sort imports alphabetically in interop_filter_test.go
  (op-core/predeploys before op-devstack)
- Remove dead execTrigger variable in FailsafeBlocksInterop test
- Revert struct literal alignment to match surrounding style

Test robustness:
- Replace all time.Sleep waits with waitForFailsafeState() helper that
  confirms filter-side state then waits for 2 L2 blocks (guarantees
  op-reth's 1s polling task has had at least 2 poll cycles)
- Make initMsg.Tx.Result.Eval() failures hard errors with
  require.NoError instead of silently skipping the failsafe assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntenance eviction

Add InvalidCrossTx::FailsafeEnabled variant for fast-path ingress
rejection without RPC round-trip. Add failsafe eviction to the
block-event maintenance loop as belt-and-suspenders with poll_failsafe.

Fix gofmt import ordering in multichain_supernode_runtime.go.

Tighten IngressRejectsInvalid test (10s timeout + explicit rejection
assert). Merge FailsafeBlocksInterop and FailsafeEvictsPooled into
FailsafeLifecycle. Extend NonInteropUnaffected to test both chains.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wwared wwared requested a review from a team as a code owner April 9, 2026 20:05
Move is_interop_tx to interop.rs as the single pub(crate) helper for
identifying interop transactions by access list. Use it in maintain and
poll_failsafe eviction paths instead of interop_deadline().is_some(),
ensuring all codepaths use the same structural check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 0% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.5%. Comparing base (ddfa07a) to head (b686ae4).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
rust/op-reth/crates/txpool/src/maintain.rs 0.0% 52 Missing ⚠️
...ust/op-reth/crates/txpool/src/supervisor/client.rs 0.0% 18 Missing ⚠️
rust/op-reth/crates/txpool/src/interop.rs 0.0% 10 Missing ⚠️
rust/op-reth/crates/txpool/src/error.rs 0.0% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ddfa07a) and HEAD (b686ae4). Click for more details.

HEAD has 19 uploads less than BASE
Flag BASE (ddfa07a) HEAD (b686ae4)
contracts-bedrock-tests 18 0
cannon-go-tests-64 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #20001      +/-   ##
===========================================
- Coverage     74.5%     0.5%   -74.1%     
===========================================
  Files          193      483     +290     
  Lines        10734    60333   +49599     
===========================================
- Hits          8002      320    -7682     
- Misses        2588    60013   +57425     
+ Partials       144        0     -144     
Flag Coverage Δ
cannon-go-tests-64 ?
contracts-bedrock-tests ?
unit 0.5% <0.0%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
rust/op-reth/crates/txpool/src/pool.rs 0.0% <ø> (ø)
rust/op-reth/crates/txpool/src/validator.rs 0.0% <ø> (ø)
rust/op-reth/crates/txpool/src/error.rs 0.0% <0.0%> (ø)
rust/op-reth/crates/txpool/src/interop.rs 0.0% <0.0%> (ø)
...ust/op-reth/crates/txpool/src/supervisor/client.rs 0.0% <0.0%> (ø)
rust/op-reth/crates/txpool/src/maintain.rs 0.0% <0.0%> (ø)

... and 670 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…Invalid

The test used bob.Plan() which includes WithRetrySubmission (5 retries,
exponential backoff). Under CI load, retrying the rejected tx exhausted
the 10s context timeout, causing the test to see DeadlineExceeded instead
of the filter's explicit rejection error.

Fix: override with WithTransactionSubmitter (no retries) and evaluate
tx.Submitted instead of tx.Included, so the filter rejection propagates
on the first attempt.

Also unifies interop tx identification under is_interop_tx in interop.rs
so all codepaths (ingress, maintenance, failsafe eviction, reorg) use
the same structural access-list check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wwared wwared force-pushed the feat/op-reth-interop-txpool-filter branch from f6db19b to b686ae4 Compare April 9, 2026 21:02
Copy link
Copy Markdown
Contributor

@axelKingsley axelKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that the acceptance testing can be factored from other existing tests, and generally there might be room to move more to a DSL.

The logic of the tests does look good, and the devstack integration looks fine too, especially if it's working.

I did not review the rust changes.

t.Require().Equal(expected, sys.InteropFilter.FailsafeEnabled(),
"interop filter failsafe state should already be %v after SetFailsafeEnabled", expected)
// Op-reth polls admin_getFailsafeEnabled every 1s. With 2s L2 block times,
// waiting for 2 blocks guarantees at least 2 poll cycles have elapsed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it matter that 2 poll cycles elapse?

// waitForFailsafeState confirms the interop filter's state matches expected,
// then waits for two L2 blocks to ensure op-reth's 1s polling task has had
// time to pick up the change. This replaces time.Sleep-based waits.
func waitForFailsafeState(t devtest.T, sys *presets.TwoL2SupernodeInterop, expected bool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't seem to wait for a given state, it seems to expect that state up-front, then do some waiting.

Comment on lines +58 to +62
// Wait for at least one block between init and exec
sys.L2B.WaitForBlock()

// Send exec message on chain B — the interop filter validates the access list
execMsg := bob.SendExecMessage(initMsg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interop filter check is only implied, right? this function TestInteropFilter_IngressAcceptsValid doesn't actually know or confirm that the filter is enabled?

I'm guessing the DSL already has something for sending an init/exec message pair and waiting their inclusions.

Comment on lines +111 to +113
require.False(errors.Is(err, context.DeadlineExceeded),
"expected explicit rejection, not timeout: %v", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a good check to confirm that the message isn't even making it in.

sys.InteropFilter.SetFailsafeEnabled(false)
waitForFailsafeState(t, sys, false)

// Step 5: Verify interop txs recover — retry to tolerate propagation lag
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little concerned about how flakey this might be in practice . could the DSL support some sort of AwaitFailsafeDisabled?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling some of these tests might already exist through other presets/packages, and we could hopefully leverage that.

like TestInteropFilter_IngressAcceptsValid is equal to all other "Valid Message" tests, just with the filter turned on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(op-reth): implement interop tx pool filter and reorg eviction

2 participants