Skip to content

Conversation

@noel2004
Copy link
Member

@noel2004 noel2004 commented Nov 25, 2025

We faced a delimma in collection_time setting (i.e. the time limit to set an assigned prove task being timeout): value too small would cause a possible time consuming task can not be completed since all submission would be rejected by timeout; in the other hand, a too big timeout would take too long to re-assign a task if the connection with assignment (prover) lost.

This PR advise to accept the proof submission even it has been timeout: there is no proper reason to reject the result if it can be verified. With the fixing we can reduce the interval of reassignment without worring about a permanent failure of occasional long-running task. The counter of timeout failure would still be counted.

Summary by CodeRabbit

  • Bug Fixes

    • Improved timeout handling: timed submissions are now processed through validation, tracked, logged, and their status synchronized instead of being rejected.
    • Added last-chance recovery to ensure assigned prover tasks are retried or reconciled on error.
    • Synchronized prover task status updates to prevent mismatches and lost state during processing.
  • New Files / Configurations

    • Added testnet/prover end-to-end configs, genesis/state files, and new prover runtime/configuration manifest files.

✏️ Tip: You can customize this high-level summary in your review settings.

@noel2004 noel2004 requested a review from Thegaram November 25, 2025 13:01
@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

Walkthrough

HandleZkProof's signature changes to a named error return with a defer-based recovery; timeout handling in validation is converted from an early-reject path to an overridden timeout flow that flips task status and records metrics; updateProofStatus results are synced back into the local proverTask. Other added files are test/config artifacts.

Changes

Cohort / File(s) Summary
Proof receiver logic
coordinator/internal/logic/submitproof/proof_receiver.go
Changed HandleZkProof to return a named error (rerr error) and added a deferred recovery (proofRecover) triggered when rerr is non-nil and task remains Assigned. Reworked timeout handling: removed early timeout-return path, now overrides timeout to mark task as Assigned, increments timeout metric, logs timing, and continues validation flow. updateProofStatus call now propagates the prover task’s current FailureType, and local proverTask.ProvingStatus/FailureType are synchronized from update results. Consolidated proof-time calculations and removed duplicate/early-exit timeout branches.
Prover e2e — mainnet feynman (tests/configs)
tests/prover-e2e/mainnet-feynman/.make.env, tests/prover-e2e/mainnet-feynman/config.json, tests/prover-e2e/mainnet-feynman/config.template.json, tests/prover-e2e/mainnet-feynman/genesis.json
Added environment file and multiple configuration files for a mainnet-feynman prover e2e setup: .make.env (BEGIN/END_BLOCK, SCROLL_FORK_NAME), config.json (db/fetch flags), config.template.json (prover_manager, db, l2, auth, sequencer templates), and a comprehensive genesis.json with allocations, codes, and storage entries for testnet initialization.
zkvm-prover configs
zkvm-prover/config_remote.json, zkvm-prover/target.json
Added config_remote.json (sdk_config, coordinator/prover settings, circuit entries with asset detours) and target.json (chunk hash and empty batches/bundles) for zkvm-prover runtime configuration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas:
    • HandleZkProof defer/recovery: ensure named rerr semantics don't mask errors and recovery side-effects are correct.
    • Timeout override: verify the decision to flip status to Assigned, metric increment, and log/timing correctness.
    • updateProofStatus synchronization: check concurrency/atomicity and that persisted vs in-memory state remain consistent.
    • Large data files (genesis.json): validate bulk additions for accidental corruptions or mis-encoded fields.

Suggested reviewers

  • georgehao
  • Thegaram
  • lispc

Poem

🐇 I hopped through code at break of dawn,

A timeout met — I stretched and yawned.
I nudged the task, kept status bright,
Tuned logs and metrics through the night.
A little hop, and proofs move on.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description explains the problem, rationale, and solution, but lacks the conventional commits format checklist and deployment/breaking change sections required by the template. Add the required template sections including PR title format confirmation (fix: prefix is correct), deployment tag versioning checkbox, and breaking change label checkbox to fully comply with the repository template.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[Fix] Accept proof submission even it has been timeout' clearly describes the main change: accepting proof submissions that occur after timeout, which aligns with the core logic modification in proof_receiver.go.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/accept_timeout_submission

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@noel2004 noel2004 requested a review from georgehao November 25, 2025 13:01
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edb5123 and e1e5278.

📒 Files selected for processing (1)
  • coordinator/internal/logic/submitproof/proof_receiver.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
common/types/db.go (2)
  • ProverTaskFailureType (99-99)
  • ProverTaskFailureTypeTimeout (105-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 20.83333% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.47%. Comparing base (27dd62e) to head (1347762).

Files with missing lines Patch % Lines
...nator/internal/logic/submitproof/proof_receiver.go 20.83% 16 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1764      +/-   ##
===========================================
- Coverage    36.48%   36.47%   -0.02%     
===========================================
  Files          248      248              
  Lines        21314    21328      +14     
===========================================
+ Hits          7777     7779       +2     
- Misses       12709    12720      +11     
- Partials       828      829       +1     
Flag Coverage Δ
coordinator 32.69% <20.83%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)

326-339: Timeout override now correctly precedes the submit‑twice guard; add tests and tidy comment/log text

Placing this timeout override before the ProverProofValid/ProverProofInvalid guard fixes the previously reported issue: late proofs for tasks marked ProverProofInvalid + ProverTaskFailureTypeTimeout are now allowed through validator instead of being rejected as “cannot submit twice”. The in‑memory flip back to ProverAssigned is sufficient because updateProofStatus later persists the final Valid/Undefined state.

A few follow‑ups:

  • This behavior is important but currently untested (Codecov shows 0% patch coverage here). It’s worth adding at least one unit/integration test that:
    • Creates a ProverTask with ProvingStatus=ProverProofInvalid and FailureType=ProverTaskFailureTypeTimeout.
    • Calls validator with StatusOk.
    • Asserts err == nil, validateFailureProverTaskTimeout increments, and proverTask.ProvingStatus becomes ProverAssigned.
  • The comments/logs still mention SessionInfoFailureTimeout and read a bit awkwardly. You could tighten them up as:
-	// Internally we overide the timeout failure:
-	// if prover task FailureType is SessionInfoFailureTimeout, the submit proof is timeout, but we still accept it
+	// Internally we override timeout failures:
+	// if the prover task FailureType is ProverTaskFailureTypeTimeout, the submission has timed out but we still accept it.
@@
-		log.Warn("proof submit proof have timeout", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType,
+		log.Warn("proof submission timed out but will be verified", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType,
 			"proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec)

Note that validateFailureProverTaskTimeout now counts timed‑out tasks whose proofs you still accept; if dashboards assumed it was a strict failure counter, you may want to adjust their interpretation.

🧹 Nitpick comments (2)
coordinator/internal/logic/submitproof/proof_receiver.go (2)

158-185: Defer-based last‑chance recovery looks sound; only minor polishing nits

The named return + deferred proofRecover nicely closes gaps where we’d previously exit with an error while the task was still assigned (e.g. hardForkName/unmarshal failures), without double‑recovering once updateProofStatus has run.

Two small nits you may want to address:

  • The log message has a typo: "last chance proof recover triggerred""last chance proof recover triggered".
  • Consider using a more specific failureType than ProverTaskFailureTypeUndefined if you want to distinguish these “unexpected error” recoveries (e.g. a dedicated ServerError/“unexpected” code), but this is optional.

Overall the control flow and conditions (rerr != nil and ProverAssigned) look correct and non‑intrusive.


464-466: Status/failureType sync into proverTask is necessary and correct

Syncing status and failureType back into proverTask after a successful transaction is important so the deferred last‑chance recovery in HandleZkProof can reliably detect whether the task is still ProverAssigned or already transitioned to Valid/Invalid. This prevents double calls to proofRecover/closeProofTask while still allowing recovery when updateProofStatus fails.

Tiny nit: the comment has a typo (failturefailure), but functionally this change is good.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1e5278 and 055f76e.

📒 Files selected for processing (1)
  • coordinator/internal/logic/submitproof/proof_receiver.go (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (2)
coordinator/internal/types/submit_proof.go (1)
  • SubmitProofParameter (4-13)
common/types/db.go (7)
  • ProverProveStatus (72-72)
  • ProvingStatus (140-140)
  • ProverAssigned (78-78)
  • ProverTaskFailureTypeUndefined (103-103)
  • ProverProofInvalid (82-82)
  • ProverTaskFailureType (99-99)
  • ProverTaskFailureTypeTimeout (105-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)

319-355: Timeout override logic achieves late‑proof acceptance; minor text/metrics nits

The timeout override in validator:

  • Correctly detects the “timed‑out task” case (ProverProofInvalid + ProverTaskFailureTypeTimeout), increments the timeout metric, and resets ProvingStatus to ProverAssigned before the double‑submit guard. This ensures late proofs for timed‑out tasks are no longer rejected by the “cannot submit twice” check and can flow through verification as intended.
  • Leaves FailureType untouched so that, on success, closeProofTask can propagate the existing timeout marker into storage, which aligns with the “still count timeouts” requirement.

Two optional cleanups (one of which is already called out in a previous review):

  • The comment still refers to SessionInfoFailureTimeout, but the actual enum is ProverTaskFailureTypeTimeout, and the log message "proof submit proof have timeout" is hard to read. The earlier suggested wording would improve clarity:

  • // Internally we overide the timeout failure:

  • // if prover task FailureType is SessionInfoFailureTimeout, the submit proof is timeout, but we still accept it

  • // Internally override timeout failures:
  • // if prover task FailureType is ProverTaskFailureTypeTimeout, the submission has timed out, but we still accept it.
    @@
  •   log.Warn("proof submit proof have timeout", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType,
    
  •   log.Warn("proof submission timed out but will be verified", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType,
      	"proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec)
    
    
    
  • Semantically, validateFailureProverTaskTimeout now counts “submissions whose task had previously timed out”, regardless of whether this particular submission ultimately succeeds. If your dashboards previously interpreted it strictly as a validation failure metric, it’s worth sanity‑checking that this new meaning is acceptable.
🧹 Nitpick comments (3)
coordinator/internal/logic/submitproof/proof_receiver.go (1)

155-299: Last‑chance proofRecover behavior and scope

The new named return + defer correctly ensures that any non‑nil error path after loading a proverTask will trigger a final proofRecover when the task is still ProverAssigned, which helps close tasks that previously leaked (e.g. hard‑fork lookup errors, unwrap/marshal failures). It also correctly avoids double‑recovering once updateProofStatus has run, thanks to the ProverProveStatus(proverTask.ProvingStatus) check.

Two follow‑ups you might want to consider:

  • The doc comment above HandleZkProof (“only proving/verifying error will lead to setting status as skipped. db/unmarshal errors will not…”) is now outdated, since these errors will also end up calling proofRecover via the defer. Updating the comment to reflect the new behavior would avoid confusion.
  • If you still want to cover actual panics (as mentioned in the earlier discussion), this defer currently doesn’t help because rerr remains nil on panic. You’d need a recover() branch inside the same defer to mark the task failed on panic as well, if that’s desired.

Functionally the current behavior looks consistent with the PR goal; above are polish/semantics clarifications.

tests/prover-e2e/mainnet-feynman/config.json (1)

1-14: Consider replacing hard‑coded DB credentials with placeholders or env‑driven values

This e2e config bakes postgres://dev:dev@localhost:5432/scroll?sslmode=disable directly into dsn. Even if this is only for local/dev use, it will keep static‑analysis tools flagging “credentials in repo” and may encourage copy‑paste into other environments.

If feasible, consider either:

  • Switching to a placeholder DSN (e.g. "postgres://<user>:<password>@localhost:5432/scroll?sslmode=disable"), or
  • Documenting that this file is an example and is normally overridden by env‑specific config.
tests/prover-e2e/mainnet-feynman/config.template.json (1)

1-41: Template placeholders & typo

This template is clear and matches the expected structure for prover manager, DB, L2, auth, and sequencer configs. Two small suggestions:

  • To avoid secret‑scanner noise and make it obvious these are placeholders, consider changing values like the DSN, "prover secret key", and "not need" to explicit placeholder tokens (e.g. "postgres://<user>:<password>@localhost/scroll?sslmode=disable", "<prover_auth_secret>", "<sequencer_decryption_key_if_needed>").
  • Minor typo: "endpoint": "<serach a public rpc endpoint like alchemy>""endpoint": "<search a public RPC endpoint like Alchemy>".
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 055f76e and 1e85070.

📒 Files selected for processing (7)
  • coordinator/internal/logic/submitproof/proof_receiver.go (6 hunks)
  • tests/prover-e2e/mainnet-feynman/.make.env (1 hunks)
  • tests/prover-e2e/mainnet-feynman/config.json (1 hunks)
  • tests/prover-e2e/mainnet-feynman/config.template.json (1 hunks)
  • tests/prover-e2e/mainnet-feynman/genesis.json (1 hunks)
  • zkvm-prover/config_remote.json (1 hunks)
  • zkvm-prover/target.json (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • tests/prover-e2e/mainnet-feynman/.make.env
  • zkvm-prover/target.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-15T08:52:44.176Z
Learnt from: colinlyguo
Repo: scroll-tech/scroll PR: 1645
File: rollup/proposer-tool-config.json:34-40
Timestamp: 2025-04-15T08:52:44.176Z
Learning: In configuration files like `rollup/proposer-tool-config.json`, placeholders such as `<mainnet read db config>` are intentionally left as-is to be replaced by users with their own configuration values when deploying the tool.

Applied to files:

  • tests/prover-e2e/mainnet-feynman/config.json
  • tests/prover-e2e/mainnet-feynman/config.template.json
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (3)
coordinator/internal/types/submit_proof.go (1)
  • SubmitProofParameter (4-13)
common/types/db.go (8)
  • ProverProveStatus (72-72)
  • ProvingStatus (140-140)
  • ProverAssigned (78-78)
  • ProverTaskFailureTypeUndefined (103-103)
  • ProverProofInvalid (82-82)
  • ProverTaskFailureType (99-99)
  • ProverTaskFailureTypeTimeout (105-105)
  • ProverProofValid (80-80)
coordinator/internal/types/auth.go (1)
  • ProverName (18-18)
🪛 Checkov (3.2.334)
tests/prover-e2e/mainnet-feynman/config.json

[medium] 4-5: Basic Auth Credentials

(CKV_SECRET_4)

tests/prover-e2e/mainnet-feynman/config.template.json

[medium] 22-23: Basic Auth Credentials

(CKV_SECRET_4)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests
🔇 Additional comments (3)
coordinator/internal/logic/submitproof/proof_receiver.go (1)

399-476: Syncing proverTask status after updateProofStatus is a good safety net

Propagating the status and failureType arguments back into the in‑memory proverTask after the transaction:

proverTask.ProvingStatus = int16(status)
proverTask.FailureType = int16(failureType)

nicely ensures that any subsequent logic (including the HandleZkProof defer) sees the up‑to‑date state and won’t accidentally treat a closed task as still Assigned. Likewise, having closeProofTask pass through the current proverTask.FailureType instead of hard‑coding Undefined lets you preserve “timeout” markers on eventually‑successful proofs, which matches the PR’s intent of still recording timeouts.

No functional issues spotted here.

zkvm-prover/config_remote.json (1)

1-36: Remote prover configuration looks structurally sound

The sdk_config and circuits sections are well‑structured, with clear separation of coordinator, prover, and per‑hard‑fork circuit URLs/paths. No hard‑coded secrets are present; everything here looks appropriate for a committed remote config.

tests/prover-e2e/mainnet-feynman/genesis.json (1)

1-109: Genesis state file appears consistent with a mainnet‑like test setup

The chain config, scroll section, and pre‑allocs (including the 0x5300… system contracts with code/storage) all look coherent for a mainnet‑like test environment. There are no embedded private keys—only public addresses and bytecode—so committing this genesis file is reasonable assuming it intentionally mirrors the target network’s public genesis.

georgehao
georgehao previously approved these changes Dec 11, 2025
@noel2004 noel2004 requested a review from lispc December 11, 2025 08:12
lispc
lispc previously approved these changes Dec 11, 2025
@noel2004 noel2004 dismissed stale reviews from lispc and georgehao via 1347762 December 11, 2025 10:07
@noel2004 noel2004 requested review from georgehao and lispc December 11, 2025 10:26
@noel2004 noel2004 merged commit 7de388e into develop Dec 12, 2025
6 checks passed
@noel2004 noel2004 deleted the fix/accept_timeout_submission branch December 12, 2025 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants