Skip to content

feat(taiko-client-rs): migrate prover from taiko-client#21788

Draft
davidtaikocha wants to merge 31 commits into
mainfrom
feat/taiko-client-rs-prover
Draft

feat(taiko-client-rs): migrate prover from taiko-client#21788
davidtaikocha wants to merge 31 commits into
mainfrom
feat/taiko-client-rs-prover

Conversation

@davidtaikocha

Copy link
Copy Markdown
Collaborator

What

Ports the prover — the last Go-only component — from packages/taiko-client to a new crates/prover in packages/taiko-client-rs, plus a taiko-client prover subcommand. This unblocks the eventual sunset of the Go client (the driver and proposer are already at parity; the driver retirement plan covers the rest).

This PR is implementation + tests + docs (phases 1–4). The gated operational rollout (devnet → testnet → mainnet shadow → cutover → Go-client teardown) is tracked separately and not part of this diff.

Approach

Rust-idiomatic rebuild with the same external behavior as the Go prover (raiko API, Inbox.prove calldata, scheduling/aggregation semantics), reusing the workspace's existing primitives:

  • L1 events (Proposed/Proved) via event-scanner (HTTP/WS, reorg-aware) — same pattern as the driver.
  • Proof submission via base-tx-manager — same pattern as the proposer.
  • Contract I/O via the already-generated crates/bindings (prove, encodeProveInput, getCoreState, getProposalHash).

The genuinely new code is the raiko HTTP client, the per-type proof buffer/cache, the compose producers (sgxgeth + sgx/zk), and the request → buffer → aggregate → submit pipeline. The RPC-free routing core (submitter::Pipeline) is split out so it's unit-testable without a node.

Parity highlights

  • Shasta-only; compose model (two sub-proofs per batch, sgxgeth first); verifier IDs 1/4/5/6.
  • Sub-proof ABI encoding pinned byte-for-byte to a cast abi-encode golden fixture.
  • Buffer idempotency/overflow/clear, out-of-order cache flush, forced-aggregation interval, strict parent-transition ordering, ZK-first→SGX fallback, proving-window + 72s unassigned delay — all ported with Go file:line references in comments.
  • CLI flags keep the Go env-var names so operator configs port unchanged. Two deliberate changes: WS no longer required (HTTP polling works), and --l1.private private-mempool submission is deferred post-cutover.
  • New --prover.shadowMode (full pipeline, no L1 submit) for the mainnet shadow gate.

Scope decisions (intentional)

  • No --l1.private private-mempool tx manager yet (deferred).
  • op/native proof types, anchor_tx_validator, and ontake/pacaya bindings not ported (dead/unused in the Go prover).

Tests

  • 54 unit tests in crates/prover (raiko status mapping, buffer/cache semantics, producers via stub raiko, calldata golden fixture, submitter pipeline routing/aggregation, scheduling matrix, startup-cursor clamp) + 13 client-bin tests, all green; clippy -D warnings -D missing_docs clean.
  • E2E (crates/prover/tests/prove_e2e.rs): driver + proposer + prover in dummy mode (single-proposal proof, two-proposal aggregation). Requires the Docker harness — runs in CI via taiko-client-rs--test.yml; not exercised locally.

A final adversarial review (Go-vs-Rust, consensus-adjacent paths) found and fixed 3 correctness issues before this PR: transient proof-request failures now retry instead of dropping the proposal; genesis anchor state reads at block number 0 (not block hash 0x0); the dedup cursor commits only after fallible reads succeed.

See docs/prover-migration-runbook.md for the flag mapping and rollout/rollback procedure.

🤖 Generated with Claude Code

davidtaikocha and others added 16 commits June 12, 2026 22:17
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ntions

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…encoding

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…genesis anchor read

Address final review findings:
- request_proof retries on any transient raiko/RPC/enrichment error instead of
  dropping the proposal, matching the Go prover's inner constant backoff.
- read genesis anchor state at block number 0 (not block hash 0x0) and propagate
  errors; short-circuit the batch-0 engine lookup like Go ProposalLastBlockID.
- commit the dedup cursor only after the fallible Proposed-handler reads succeed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
davidtaikocha and others added 12 commits June 13, 2026 11:30
Port #21782: skip the ZK proof path and fall back to the
base proof when a proposal is more than maxZKProofProposalDistance ahead of the
last finalized proposal (default 30), so a slow ZK proof does not block catch-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Drop the two dead --backoff.* flags (never read; request retries use
--prover.proofPollingInterval) and --prove.gasLimit (prove gas is always
estimated by the tx-manager). No behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Behavior-preserving cleanups (no public interface or behavior changes):

- state: fetch_update replaces the hand-rolled compare_exchange loop
- cache: extract shared `cloned` helper for the duplicated map fetch
- producer/compose: extract `join_base_and_geth` for the base/sgxgeth join
- submitter: add ProofRequestMeta::from_request + unexpected_proof_type
  helpers and a shared test-harness builder
- buffer: simplify the overflow guard
- prover, tx_manager_adapter, commands/prover, raiko/prove_e2e tests:
  drop redundant bindings, reuse helpers, inline a trivial wrapper

fmt + clippy (incl. missing_docs) clean. `just test` passes except the
pre-existing prover::prove_e2e::single_proposal_is_proven timeout, which
fails identically on the unmodified baseline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The prover e2e tests proposed from a different account than the prover with an empty localProposerAddresses. With the deployed 4-hour proving window, the prover routed each proposal to WaitForExpiry(window + 72s) and never proved within the 90s timeout, so single_proposal_is_proven timed out. Register the proposer as a local proposer so the prover treats its proposals as assigned and proves immediately, mirroring the Go prover test's LocalProposerAddresses setup.

Also restore the per-branch routing INFO logs the Rust port had dropped (received event / submit-now / wait-for-expiry) to match the Go prover's observability.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oofs

aggregate_proofs_by_type cleared the buffer and returned on any
non-Pending error, permanently dropping already-generated proofs whose
Proposed events were already marked handled (no scanner replay). Go
retries all aggregation errors with constant backoff and only clears on
shutdown. Mirror that: retry non-terminal errors, keeping the buffer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@davidtaikocha davidtaikocha force-pushed the feat/taiko-client-rs-prover branch from 229f05f to 90fa970 Compare June 14, 2026 01:53
davidtaikocha and others added 3 commits June 14, 2026 10:20
base-tx-manager returns Ok(receipt) even when a prove transaction reached
confirmation depth but reverted on-chain, so batch_submit_proofs counted a
revert as a successful submission. Check receipt.status(): on a revert,
record taiko_prover_submission_reverted, log, and return
ProverError::SubmissionReverted so the batch consumer resends the proof
requests (matching Go prover.go:302-314) instead of treating them as sent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror Go's withRetry submit contract: bounded in-place retry of the
validate/wait/build read phase with proofs buffered, falling back to a
re-request only on revert, unretryable send, or exhaustion. Treat a
not-found L1 block during validation as transient (retry) instead of a
permanent skip, bound wait_transition_verified to 60s, and cache a
contiguous proof on buffer overflow instead of dropping it.

Also:
- Share the getCoreState read via rpc::Client::core_state and hoist a
  rpc::base_tx_manager_config builder used by prover and proposer.
- Pin fee_limit_multiplier=10 to match Go's --tx.feeLimitMultiplier
  (the base tx-manager default was 5).
- Report latest_verified_id as the finalized L2 block number (Go parity)
  and log the Proved checkpoint.
- Unify the Proposed/Proved scanner reconnect loops; drop dead code and
  unused raiko payload fields.

Verified: prover/proposer/rpc lib tests pass; clippy and fmt clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Record Go's updateProvingMetrics per-proof-type latency/count series in the
Rust prover: thread a request_at start instant through the ProofProducer trait
and request_validated, and emit generation_time / generation_time_sum /
generated for sgx_geth, sgx, r0, and sp1 in both single and aggregation modes.
Add a float_counter helper for cumulative fractional-second sums.

Also clarify in docs/comments that the ZK proposal-distance gate is a Rust-only
catch-up optimization with no Go equivalent, and document the absence of a
periodic proposal re-scan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant