feat(op-reth): implement block builder failsafe and interop filter integration#20001
feat(op-reth): implement block builder failsafe and interop filter integration#20001
Conversation
Change TRANSACTION_VALIDITY_WINDOW_SECS from 3600 (1 hour) to 86400 (24 hours) to match op-geth's ingressFilterTxValidityWindow spec. This constant is used for both the interop deadline set on validated transactions and the timeout passed to supervisor_checkAccessList. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The revalidation window (600s) is intentionally shorter than the ingress timeout (86400s). Phase 3 evaluation confirmed the block-event-driven revalidation loop meets all 6 evaluation criteria: - Fires on every canonical block commit (~2s on OP chains) - Covers all pooled interop txs with deadlines - Evicts expired txs and revalidates stale ones - Failsafe polling (Phase 4) covers the block-gap edge case Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add background failsafe detection to the interop tx pool: - Add AtomicBool failsafe_enabled to SupervisorClientInner, shared via Arc across all clones - Add query_failsafe() method that calls admin_getFailsafeEnabled RPC and caches the result - Add is_failsafe_enabled() accessor for the cached state - Add poll_failsafe() background task that polls every 1s and evicts all interop txs from the pool on failsafe transition - Wire failsafe polling task in node.rs alongside the existing maintenance task, using ref+clone pattern to share the supervisor client - Narrow SupervisorClientInner visibility to pub(crate) — no external references exist - Remove Clone derive from SupervisorClientInner (AtomicBool is !Clone, inner is always behind Arc) The ingress filter already rejects new interop txs during failsafe via CheckAccessList at the source of truth. The polling task provides the eviction mechanism for already-pooled txs. No changes to the maintenance loop or payload builder are needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add in-process op-interop-filter service to devstack for supernode interop presets: - Refactor startMixedOpRethNode into build+start pattern to allow injecting --rollup.supervisor-http before starting - Add startSupernodeELWithSupervisorURL for both op-geth and op-reth - Create InteropFilter wrapper (sysgo/interop_filter.go) running the filter service in-process with direct Go struct access - Add proxy pattern in supernode runtime to break circular dependency between EL nodes and filter service - Add WithInteropFilter() preset option (supernode presets only) - Add UseInteropFilter to PresetConfig - Expose InteropFilter on MultiChainRuntime and TwoL2SupernodeInterop for direct test access (SetFailsafeEnabled, Ready, FailsafeEnabled) - Add Ready(), SetFailsafeEnabled(), FailsafeEnabled() pass-through methods to filter.Service No changes to supervisor-based presets or multichain_supervisor_runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add acceptance tests exercising the op-reth + op-interop-filter topology with the supernode interop preset: - TestInteropFilter_IngressAcceptsValid: valid interop tx with correct cross-chain references passes through the filter - TestInteropFilter_IngressRejectsInvalid: fabricated CrossL2Inbox access list entries are rejected by the filter - TestInteropFilter_FailsafeBlocksInterop: failsafe toggle blocks new interop txs, recovery works after disabling - TestInteropFilter_NonInteropUnaffected: regular transfers succeed regardless of failsafe state - TestInteropFilter_FailsafeEvictsPooled: failsafe transition evicts existing interop txs and rejects new ones All tests use presets.WithInteropFilter() and direct InteropFilter.SetFailsafeEnabled() for in-process failsafe control. Tests are skipped by default (standard interop test pattern). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ernode starts The interop filter's chain ingesters need blocks to backfill, but the supernode (which drives block production) starts after the filter. Split the readiness wait out of startInteropFilter and call WaitForReady after the supernode and batchers are running. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 5 tests pass with RUST_JIT_BUILD=1 DEVSTACK_L2EL_KIND=op-reth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convention fixes: - Sort imports alphabetically in multichain_supernode_runtime.go (rollup with op-node group, tcpproxy after sources) - Sort imports alphabetically in interop_filter_test.go (op-core/predeploys before op-devstack) - Remove dead execTrigger variable in FailsafeBlocksInterop test - Revert struct literal alignment to match surrounding style Test robustness: - Replace all time.Sleep waits with waitForFailsafeState() helper that confirms filter-side state then waits for 2 L2 blocks (guarantees op-reth's 1s polling task has had at least 2 poll cycles) - Make initMsg.Tx.Result.Eval() failures hard errors with require.NoError instead of silently skipping the failsafe assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntenance eviction Add InvalidCrossTx::FailsafeEnabled variant for fast-path ingress rejection without RPC round-trip. Add failsafe eviction to the block-event maintenance loop as belt-and-suspenders with poll_failsafe. Fix gofmt import ordering in multichain_supernode_runtime.go. Tighten IngressRejectsInvalid test (10s timeout + explicit rejection assert). Merge FailsafeBlocksInterop and FailsafeEvictsPooled into FailsafeLifecycle. Extend NonInteropUnaffected to test both chains. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move is_interop_tx to interop.rs as the single pub(crate) helper for identifying interop transactions by access list. Use it in maintain and poll_failsafe eviction paths instead of interop_deadline().is_some(), ensuring all codepaths use the same structural check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #20001 +/- ##
===========================================
- Coverage 74.5% 0.5% -74.1%
===========================================
Files 193 483 +290
Lines 10734 60333 +49599
===========================================
- Hits 8002 320 -7682
- Misses 2588 60013 +57425
+ Partials 144 0 -144
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…Invalid The test used bob.Plan() which includes WithRetrySubmission (5 retries, exponential backoff). Under CI load, retrying the rejected tx exhausted the 10s context timeout, causing the test to see DeadlineExceeded instead of the filter's explicit rejection error. Fix: override with WithTransactionSubmitter (no retries) and evaluate tx.Submitted instead of tx.Included, so the filter rejection propagates on the first attempt. Also unifies interop tx identification under is_interop_tx in interop.rs so all codepaths (ingress, maintenance, failsafe eviction, reorg) use the same structural access-list check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f6db19b to
b686ae4
Compare
axelKingsley
left a comment
There was a problem hiding this comment.
I have a feeling that the acceptance testing can be factored from other existing tests, and generally there might be room to move more to a DSL.
The logic of the tests does look good, and the devstack integration looks fine too, especially if it's working.
I did not review the rust changes.
| t.Require().Equal(expected, sys.InteropFilter.FailsafeEnabled(), | ||
| "interop filter failsafe state should already be %v after SetFailsafeEnabled", expected) | ||
| // Op-reth polls admin_getFailsafeEnabled every 1s. With 2s L2 block times, | ||
| // waiting for 2 blocks guarantees at least 2 poll cycles have elapsed. |
There was a problem hiding this comment.
why does it matter that 2 poll cycles elapse?
| // waitForFailsafeState confirms the interop filter's state matches expected, | ||
| // then waits for two L2 blocks to ensure op-reth's 1s polling task has had | ||
| // time to pick up the change. This replaces time.Sleep-based waits. | ||
| func waitForFailsafeState(t devtest.T, sys *presets.TwoL2SupernodeInterop, expected bool) { |
There was a problem hiding this comment.
This function doesn't seem to wait for a given state, it seems to expect that state up-front, then do some waiting.
| // Wait for at least one block between init and exec | ||
| sys.L2B.WaitForBlock() | ||
|
|
||
| // Send exec message on chain B — the interop filter validates the access list | ||
| execMsg := bob.SendExecMessage(initMsg) |
There was a problem hiding this comment.
the interop filter check is only implied, right? this function TestInteropFilter_IngressAcceptsValid doesn't actually know or confirm that the filter is enabled?
I'm guessing the DSL already has something for sending an init/exec message pair and waiting their inclusions.
| require.False(errors.Is(err, context.DeadlineExceeded), | ||
| "expected explicit rejection, not timeout: %v", err) | ||
| } |
There was a problem hiding this comment.
Nice, this is a good check to confirm that the message isn't even making it in.
| sys.InteropFilter.SetFailsafeEnabled(false) | ||
| waitForFailsafeState(t, sys, false) | ||
|
|
||
| // Step 5: Verify interop txs recover — retry to tolerate propagation lag |
There was a problem hiding this comment.
i'm a little concerned about how flakey this might be in practice . could the DSL support some sort of AwaitFailsafeDisabled?
There was a problem hiding this comment.
I have a feeling some of these tests might already exist through other presets/packages, and we could hopefully leverage that.
like TestInteropFilter_IngressAcceptsValid is equal to all other "Valid Message" tests, just with the filter turned on.
Summary
Implements the block builder failsafe (item 5 from #19411) and integrates op-reth with the op-interop-filter for end-to-end acceptance testing.
admin_getFailsafeEnabledevery 1s. On transition to enabled, immediately evicts all interop txs from the pool. Matches op-geth'sstartBackgroundInteropFailsafeDetection.AtomicBoolonSupervisorClientprovides a fast-path reject at ingress without an RPC round-trip.TRANSACTION_VALIDITY_WINDOW_SECSchanged from 3600 to 86400 to match op-geth'singressFilterTxValidityWindow. The 600s revalidation window in the maintenance loop remains the real control.WithInteropFilter()option onTwoL2SupernodeInteroppresets enables devstack sysgo tests to run both op-reth and the op-interop-filter together inside acceptance tests, with direct access to toggle failsafe state without admin RPC/JWT.Issue progress
developdevelop; this PR aligns validity window to 86400sdevelopdevelopTest plan
TestInteropFilter_IngressAcceptsValid— valid cross-chain tx passes through filter and reaches cross-safeTestInteropFilter_IngressRejectsInvalid— fabricated access list entry is explicitly rejected (not timeout)TestInteropFilter_FailsafeLifecycle— tx succeeds → enable failsafe → tx rejected → disable failsafe → tx recoversTestInteropFilter_NonInteropUnaffected— regular transfers succeed on both chains during failsafejust lintandjust fmt-fixpass inrust/go vetandgo fmtpass for changed Go packagesCloses #19411