Skip to content

fix(taiko-client-rs): report execution head in /status when unsafe counter lags it#21777

Open
davidtaikocha wants to merge 1 commit into
mainfrom
fix-whitelist-status-head-lag
Open

fix(taiko-client-rs): report execution head in /status when unsafe counter lags it#21777
davidtaikocha wants to merge 1 commit into
mainfrom
fix-whitelist-status-head-lag

Conversation

@davidtaikocha

Copy link
Copy Markdown
Collaborator

The bug

The whitelist driver's highest_unsafe counter only moves on three events: a local POST /preconfBlocks build (set), a P2P envelope import (raise), and the startup re-seed from the EE head. Canonical L1 derivation advances the execution head without touching the counter, and the /status reconciliation only handled the counter-above-head direction (the L1-reorg case).

So whenever the head advances via derivation with no gossip flowing — post-checkpoint-sync backfill, or the first operator coming online after a sequencing outage — /status.highestUnsafeL2PayloadBlockId permanently lags the EE head.

Catalyst's sync gate (is_block_height_synced_between_taiko_geth_and_the_driver) requires the reported value to exactly equal the EE latest head (or be 0). A lagging value means: Catalyst never starts preconfirming, its cancel counter trips after ~96 heartbeats (~3.2 min), and it self-restarts in a loop — which never resolves, because restarting Catalyst doesn't re-seed the driver's counter, and with no preconfirmations flowing nothing ever raises it. Only a driver restart clears the wedge.

The Go driver doesn't have this problem: it advances its counter to the canonical tip on every derived proposal (recordLatestSeenProposal).

The fix

reconcile_highest_unsafe now reports the execution head whenever it is readable, in both drift directions. This is always an honest answer — every canonical block was inserted by this driver too. The reorg (above-head) direction keeps its warn; the catch-up (below-head) direction logs at debug since it's a normal transient state polled every ~2s.

The report-only design is unchanged: the stored counter is never written back (the head is read without the state lock and may be stale; the next poll self-heals), and import/build semantics (raise/set) are untouched — /status is the only reader of the counter.

Notes

  • Found during the Go-driver-retirement readiness review (driver-sync-gate behavior through catch-up/recovery); this turns that verification item into a fix.
  • Unit tests updated: the previous reconcile_keeps_counter_when_below_reth_head test pinned the wedge behavior as intended; it now asserts the head is reported.
  • cargo test -p whitelist-preconfirmation-driver --lib (105 passed), crate-scoped clippy with CI flags, and just fmt-check are green.

…unter lags it

The highest-unsafe counter only moves on preconfirmation imports, local
builds, and the startup re-seed; canonical L1 derivation advances the
execution head without touching it. The /status reconciliation only
clamped the counter-above-head direction, so during no-gossip catch-up
(post-checkpoint backfill, first operator coming online after an
outage) the reported value permanently lags the head. Catalyst's sync
gate requires exact equality with the EE head, so a lagging report
blocks preconfirmation and puts Catalyst in a self-restart loop that
only a driver restart clears.

Report the execution head in both drift directions instead, keeping
the report-only/self-healing design and matching the Go driver, which
advances its counter on every canonical proposal.
@davidtaikocha davidtaikocha marked this pull request as ready for review June 13, 2026 09:10
@davidtaikocha davidtaikocha enabled auto-merge June 13, 2026 09:10
@github-actions

Copy link
Copy Markdown
Contributor

🐋 DeepSeek Code Review

🔵 Suggestions

  • Function name clarity: reconcile_highest_unsafe now unconditionally returns the execution head when it is available, never clamping or comparing. Consider renaming to something like effective_highest_unsafe or resolve_unsafe_head to reflect the simpler semantics. Alternatively, the one-line helper could be inlined in get_status_snapshot since its entire logic is head.unwrap_or(tracked).

  • Fallback semantics: The tracked parameter is only used as a fallback for None. It’s worth documenting explicitly that it is never compared to head outside of the fallback case, so future readers don’t assume any reconciliation logic remains.

🟢 What Looks Good

  • Root cause isolation: The PR clearly explains the two-direction drift problem and the exact Catalyst sync-gate failure mode.
  • Behavioural alignment: Mirroring the Go driver’s approach of using the canonical L2 head as the reported value.
  • Minimal, targeted change: Only the reported status value changes; the stored highest_unsafe counter is never written back, preserving import/build semantics.
  • Well-scoped test update: The previous test that pinned the wedge behaviour is correctly replaced with a test asserting the head is now reported.
  • Log differentiation: Using warn for reorgs (head below counter) and debug for catch-up (head above counter) is appropriate and aids observability.
  • No new concurrency or safety issues: The head is read without a lock, but the design acknowledges staleness and self-healing on the next poll.

Automatically triggered on PR update • model: deepseek-v4-pro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant