-
Notifications
You must be signed in to change notification settings - Fork 627
[Fix] Accept proof submission even it has been timeout #1764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughHandleZkProof's signature changes to a named error return with a defer-based recovery; timeout handling in validation is converted from an early-reject path to an overridden timeout flow that flips task status and records metrics; updateProofStatus results are synced back into the local proverTask. Other added files are test/config artifacts. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
coordinator/internal/logic/submitproof/proof_receiver.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
common/types/db.go (2)
ProverTaskFailureType(99-99)ProverTaskFailureTypeTimeout(105-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check
- GitHub Check: tests
- GitHub Check: tests
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1764 +/- ##
===========================================
- Coverage 36.48% 36.47% -0.02%
===========================================
Files 248 248
Lines 21314 21328 +14
===========================================
+ Hits 7777 7779 +2
- Misses 12709 12720 +11
- Partials 828 829 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
326-339: Timeout override now correctly precedes the submit‑twice guard; add tests and tidy comment/log textPlacing this timeout override before the
ProverProofValid/ProverProofInvalidguard fixes the previously reported issue: late proofs for tasks markedProverProofInvalid + ProverTaskFailureTypeTimeoutare now allowed through validator instead of being rejected as “cannot submit twice”. The in‑memory flip back toProverAssignedis sufficient becauseupdateProofStatuslater persists the finalValid/Undefinedstate.A few follow‑ups:
- This behavior is important but currently untested (Codecov shows 0% patch coverage here). It’s worth adding at least one unit/integration test that:
- Creates a
ProverTaskwithProvingStatus=ProverProofInvalidandFailureType=ProverTaskFailureTypeTimeout.- Calls
validatorwithStatusOk.- Asserts
err == nil,validateFailureProverTaskTimeoutincrements, andproverTask.ProvingStatusbecomesProverAssigned.- The comments/logs still mention
SessionInfoFailureTimeoutand read a bit awkwardly. You could tighten them up as:- // Internally we overide the timeout failure: - // if prover task FailureType is SessionInfoFailureTimeout, the submit proof is timeout, but we still accept it + // Internally we override timeout failures: + // if the prover task FailureType is ProverTaskFailureTypeTimeout, the submission has timed out but we still accept it. @@ - log.Warn("proof submit proof have timeout", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType, + log.Warn("proof submission timed out but will be verified", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType, "proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec)Note that
validateFailureProverTaskTimeoutnow counts timed‑out tasks whose proofs you still accept; if dashboards assumed it was a strict failure counter, you may want to adjust their interpretation.
🧹 Nitpick comments (2)
coordinator/internal/logic/submitproof/proof_receiver.go (2)
158-185: Defer-based last‑chance recovery looks sound; only minor polishing nitsThe named return + deferred
proofRecovernicely closes gaps where we’d previously exit with an error while the task was still assigned (e.g.hardForkName/unmarshal failures), without double‑recovering onceupdateProofStatushas run.Two small nits you may want to address:
- The log message has a typo:
"last chance proof recover triggerred"→"last chance proof recover triggered".- Consider using a more specific
failureTypethanProverTaskFailureTypeUndefinedif you want to distinguish these “unexpected error” recoveries (e.g. a dedicatedServerError/“unexpected” code), but this is optional.Overall the control flow and conditions (
rerr != nilandProverAssigned) look correct and non‑intrusive.
464-466: Status/failureType sync into proverTask is necessary and correctSyncing
statusandfailureTypeback intoproverTaskafter a successful transaction is important so the deferred last‑chance recovery inHandleZkProofcan reliably detect whether the task is stillProverAssignedor already transitioned toValid/Invalid. This prevents double calls toproofRecover/closeProofTaskwhile still allowing recovery whenupdateProofStatusfails.Tiny nit: the comment has a typo (
failture→failure), but functionally this change is good.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
coordinator/internal/logic/submitproof/proof_receiver.go(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (2)
coordinator/internal/types/submit_proof.go (1)
SubmitProofParameter(4-13)common/types/db.go (7)
ProverProveStatus(72-72)ProvingStatus(140-140)ProverAssigned(78-78)ProverTaskFailureTypeUndefined(103-103)ProverProofInvalid(82-82)ProverTaskFailureType(99-99)ProverTaskFailureTypeTimeout(105-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests
- GitHub Check: check
- GitHub Check: tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
319-355: Timeout override logic achieves late‑proof acceptance; minor text/metrics nitsThe timeout override in
validator:
- Correctly detects the “timed‑out task” case (
ProverProofInvalid+ProverTaskFailureTypeTimeout), increments the timeout metric, and resetsProvingStatustoProverAssignedbefore the double‑submit guard. This ensures late proofs for timed‑out tasks are no longer rejected by the “cannot submit twice” check and can flow through verification as intended.- Leaves
FailureTypeuntouched so that, on success,closeProofTaskcan propagate the existing timeout marker into storage, which aligns with the “still count timeouts” requirement.Two optional cleanups (one of which is already called out in a previous review):
The comment still refers to
SessionInfoFailureTimeout, but the actual enum isProverTaskFailureTypeTimeout, and the log message"proof submit proof have timeout"is hard to read. The earlier suggested wording would improve clarity:// Internally we overide the timeout failure:
// if prover task FailureType is SessionInfoFailureTimeout, the submit proof is timeout, but we still accept it
- // Internally override timeout failures:
- // if prover task FailureType is ProverTaskFailureTypeTimeout, the submission has timed out, but we still accept it.
@@
log.Warn("proof submit proof have timeout", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType,
log.Warn("proof submission timed out but will be verified", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType, "proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec)
- Semantically,
validateFailureProverTaskTimeoutnow counts “submissions whose task had previously timed out”, regardless of whether this particular submission ultimately succeeds. If your dashboards previously interpreted it strictly as a validation failure metric, it’s worth sanity‑checking that this new meaning is acceptable.
🧹 Nitpick comments (3)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
155-299: Last‑chanceproofRecoverbehavior and scopeThe new named return +
defercorrectly ensures that any non‑nil error path after loading aproverTaskwill trigger a finalproofRecoverwhen the task is stillProverAssigned, which helps close tasks that previously leaked (e.g. hard‑fork lookup errors, unwrap/marshal failures). It also correctly avoids double‑recovering onceupdateProofStatushas run, thanks to theProverProveStatus(proverTask.ProvingStatus)check.Two follow‑ups you might want to consider:
- The doc comment above
HandleZkProof(“only proving/verifying error will lead to setting status as skipped. db/unmarshal errors will not…”) is now outdated, since these errors will also end up callingproofRecovervia the defer. Updating the comment to reflect the new behavior would avoid confusion.- If you still want to cover actual panics (as mentioned in the earlier discussion), this
defercurrently doesn’t help becausererrremainsnilon panic. You’d need arecover()branch inside the samedeferto mark the task failed on panic as well, if that’s desired.Functionally the current behavior looks consistent with the PR goal; above are polish/semantics clarifications.
tests/prover-e2e/mainnet-feynman/config.json (1)
1-14: Consider replacing hard‑coded DB credentials with placeholders or env‑driven valuesThis e2e config bakes
postgres://dev:dev@localhost:5432/scroll?sslmode=disabledirectly intodsn. Even if this is only for local/dev use, it will keep static‑analysis tools flagging “credentials in repo” and may encourage copy‑paste into other environments.If feasible, consider either:
- Switching to a placeholder DSN (e.g.
"postgres://<user>:<password>@localhost:5432/scroll?sslmode=disable"), or- Documenting that this file is an example and is normally overridden by env‑specific config.
tests/prover-e2e/mainnet-feynman/config.template.json (1)
1-41: Template placeholders & typoThis template is clear and matches the expected structure for prover manager, DB, L2, auth, and sequencer configs. Two small suggestions:
- To avoid secret‑scanner noise and make it obvious these are placeholders, consider changing values like the DSN,
"prover secret key", and"not need"to explicit placeholder tokens (e.g."postgres://<user>:<password>@localhost/scroll?sslmode=disable","<prover_auth_secret>","<sequencer_decryption_key_if_needed>").- Minor typo:
"endpoint": "<serach a public rpc endpoint like alchemy>"→"endpoint": "<search a public RPC endpoint like Alchemy>".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
coordinator/internal/logic/submitproof/proof_receiver.go(6 hunks)tests/prover-e2e/mainnet-feynman/.make.env(1 hunks)tests/prover-e2e/mainnet-feynman/config.json(1 hunks)tests/prover-e2e/mainnet-feynman/config.template.json(1 hunks)tests/prover-e2e/mainnet-feynman/genesis.json(1 hunks)zkvm-prover/config_remote.json(1 hunks)zkvm-prover/target.json(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- tests/prover-e2e/mainnet-feynman/.make.env
- zkvm-prover/target.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-15T08:52:44.176Z
Learnt from: colinlyguo
Repo: scroll-tech/scroll PR: 1645
File: rollup/proposer-tool-config.json:34-40
Timestamp: 2025-04-15T08:52:44.176Z
Learning: In configuration files like `rollup/proposer-tool-config.json`, placeholders such as `<mainnet read db config>` are intentionally left as-is to be replaced by users with their own configuration values when deploying the tool.
Applied to files:
tests/prover-e2e/mainnet-feynman/config.jsontests/prover-e2e/mainnet-feynman/config.template.json
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (3)
coordinator/internal/types/submit_proof.go (1)
SubmitProofParameter(4-13)common/types/db.go (8)
ProverProveStatus(72-72)ProvingStatus(140-140)ProverAssigned(78-78)ProverTaskFailureTypeUndefined(103-103)ProverProofInvalid(82-82)ProverTaskFailureType(99-99)ProverTaskFailureTypeTimeout(105-105)ProverProofValid(80-80)coordinator/internal/types/auth.go (1)
ProverName(18-18)
🪛 Checkov (3.2.334)
tests/prover-e2e/mainnet-feynman/config.json
[medium] 4-5: Basic Auth Credentials
(CKV_SECRET_4)
tests/prover-e2e/mainnet-feynman/config.template.json
[medium] 22-23: Basic Auth Credentials
(CKV_SECRET_4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check
- GitHub Check: tests
- GitHub Check: tests
🔇 Additional comments (3)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
399-476: SyncingproverTaskstatus afterupdateProofStatusis a good safety netPropagating the
statusandfailureTypearguments back into the in‑memoryproverTaskafter the transaction:proverTask.ProvingStatus = int16(status) proverTask.FailureType = int16(failureType)nicely ensures that any subsequent logic (including the
HandleZkProofdefer) sees the up‑to‑date state and won’t accidentally treat a closed task as stillAssigned. Likewise, havingcloseProofTaskpass through the currentproverTask.FailureTypeinstead of hard‑codingUndefinedlets you preserve “timeout” markers on eventually‑successful proofs, which matches the PR’s intent of still recording timeouts.No functional issues spotted here.
zkvm-prover/config_remote.json (1)
1-36: Remote prover configuration looks structurally soundThe
sdk_configandcircuitssections are well‑structured, with clear separation of coordinator, prover, and per‑hard‑fork circuit URLs/paths. No hard‑coded secrets are present; everything here looks appropriate for a committed remote config.tests/prover-e2e/mainnet-feynman/genesis.json (1)
1-109: Genesis state file appears consistent with a mainnet‑like test setupThe chain config, scroll section, and pre‑allocs (including the 0x5300… system contracts with code/storage) all look coherent for a mainnet‑like test environment. There are no embedded private keys—only public addresses and bytecode—so committing this genesis file is reasonable assuming it intentionally mirrors the target network’s public genesis.
We faced a delimma in
collection_timesetting (i.e. the time limit to set an assigned prove task being timeout): value too small would cause a possible time consuming task can not be completed since all submission would be rejected by timeout; in the other hand, a too big timeout would take too long to re-assign a task if the connection with assignment (prover) lost.This PR advise to accept the proof submission even it has been timeout: there is no proper reason to reject the result if it can be verified. With the fixing we can reduce the interval of reassignment without worring about a permanent failure of occasional long-running task. The counter of timeout failure would still be counted.
Summary by CodeRabbit
Bug Fixes
New Files / Configurations
✏️ Tip: You can customize this high-level summary in your review settings.