feat: try to build cursor multiple times before giving up#7225
feat: try to build cursor multiple times before giving up#7225
Conversation
|
df96677 to
e03f9bd
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7225 +/- ##
============================
============================
🚀 New features to boost your workflow:
|
📝 WalkthroughWalkthroughThe scraper now uses a retry mechanism instead of a single scrape attempt. A new helper function Changes
Sequence DiagramsequenceDiagram
participant Main as Agent Main Loop
participant Helper as try_n_times_to_scrape
participant Scraper as Scraper
participant Sleep as Sleep/Retry
Main->>Helper: attempt scraping (max 10 attempts)
rect rgb(200, 220, 255)
Note over Helper: Retry Loop
loop attempts remaining
Helper->>Scraper: scrape()
alt Success
Scraper-->>Helper: JoinHandle
Helper->>Main: return JoinHandle
else Failure
Scraper-->>Helper: error
Helper->>Sleep: sleep(RPC_RETRY_SLEEP_DURATION)
Sleep-->>Helper: done
Helper->>Helper: log error & toggle critical_error
end
end
end
alt After All Attempts
rect rgb(255, 200, 200)
Note over Helper: Exhausted Retries
Helper->>Main: return Err
Main->>Main: log "Failed to scrape chain"
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
rust/main/agents/scraper/src/agent.rs (3)
133-145: Keep metrics flowing even if scraping never startsIf all retries fail, we currently skip spawning the chain’s metrics updater. That makes the gauges go quiet, which isn’t great for visibility. Spawn the updater even on failure so the dashboard reflects the critical state.
Apply this minimal diff:
- let scraper_task = match self + let scraper_task = match self .try_n_times_to_scrape(scraper, CURSOR_INSTANTIATION_ATTEMPTS) .await { Ok(s) => s, Err(err) => { tracing::error!(?err, ?scraper.domain, "Failed to scrape chain"); + // Still expose metrics so the chain's error state is visible. + tasks.push(metrics_updater.spawn()); continue; } }; tasks.push(scraper_task); tasks.push(metrics_updater.spawn());
154-178: Tighten retry loop: no extra sleep on last attempt, clearer logs, guard zero attempts, richer errorRight now we nap after the final failed attempt, which just delays giving up; logs are 0‑based and miss max attempts; zero attempts yields immediate generic error; final error lacks domain context.
Apply this diff:
- /// Try to scrape attempts times before giving up. + /// Try to scrape `attempts` times before giving up. async fn try_n_times_to_scrape( &self, scraper: &ChainScraper, attempts: usize, ) -> eyre::Result<JoinHandle<()>> { - for i in 0..attempts { - let scraper_task = match self.scrape(scraper).await { - Ok(s) => s, - Err(err) => { - tracing::error!(?err, ?scraper.domain, attempt_count=i, "Failed to scrape chain"); - self.chain_metrics - .set_critical_error(scraper.domain.name(), true); - sleep(RPC_RETRY_SLEEP_DURATION).await; - continue; - } - }; - - self.chain_metrics - .set_critical_error(scraper.domain.name(), false); - return Ok(scraper_task); - } - Err(eyre::eyre!("Failed to scrape chain")) + if attempts == 0 { + return Err(eyre::eyre!( + "attempts must be > 0 for {}", + scraper.domain.name() + )); + } + for attempt in 1..=attempts { + match self.scrape(scraper).await { + Ok(s) => { + self.chain_metrics + .set_critical_error(scraper.domain.name(), false); + return Ok(s); + } + Err(err) => { + tracing::error!( + ?err, + ?scraper.domain, + attempt, + max_attempts = attempts, + "Failed to scrape chain" + ); + self.chain_metrics + .set_critical_error(scraper.domain.name(), true); + if attempt < attempts { + sleep(RPC_RETRY_SLEEP_DURATION).await; + } + } + } + } + Err(eyre::eyre!("Failed to scrape chain {}", scraper.domain.name())) }Optional: if the frequent flip of critical_error is noisy, set it to true only after exhausting attempts. That said, toggling per attempt is fine if dashboards smooth it.
21-21: ConsolidateCURSOR_INSTANTIATION_ATTEMPTSto avoid duplication across agentsYe were right to flag this one—this constant's defined identically across three agent files (scraper, validator, and relayer), all set to 10. That's a recipe fer things to drift apart.
Suggested fix: create a shared constant in a common module (like
agents/common/src/constants.rsor similar), or make it configurable throughScraperSettingsif each agent should have its own knob. Just keep 'em from wandering off in different directions down the road.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 1df0f95 and fc0dec67695adc6e2693b2f91cd5da96311b9583.
📒 Files selected for processing (2)
rust/main/agents/scraper/src/agent.rs(4 hunks)rust/main/agents/validator/src/validator.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
rust/main/agents/scraper/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain scraper agent Rust sources under rust/main/agents/scraper
Files:
rust/main/agents/scraper/src/agent.rs
rust/main/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Run cargo clippy for Rust code linting
Files:
rust/main/agents/scraper/src/agent.rsrust/main/agents/validator/src/validator.rs
rust/main/agents/validator/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain validator agent Rust sources under rust/main/agents/validator
Files:
rust/main/agents/validator/src/validator.rs
🧠 Learnings (1)
📚 Learning: 2025-09-02T18:44:06.598Z
Learnt from: CR
PR: hyperlane-xyz/hyperlane-monorepo#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-02T18:44:06.598Z
Learning: Applies to rust/main/agents/scraper/**/src/**/*.rs : Maintain scraper agent Rust sources under rust/main/agents/scraper
Applied to files:
rust/main/agents/scraper/src/agent.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: infra-test
- GitHub Check: build-and-push-to-gcr
- GitHub Check: e2e-matrix (cosmosnative)
- GitHub Check: e2e-matrix (radix)
- GitHub Check: e2e-matrix (starknet)
- GitHub Check: e2e-matrix (cosmwasm)
- GitHub Check: e2e-matrix (evm)
- GitHub Check: e2e-matrix (sealevel)
- GitHub Check: lander-coverage
- GitHub Check: lint-rs
- GitHub Check: test-rs
🔇 Additional comments (5)
rust/main/agents/validator/src/validator.rs (4)
24-27: Import looks good.The addition of
RPC_RETRY_SLEEP_DURATIONis necessary for the retry sleep logic. Clean.
269-272: Better error handling here.Logging the error and continuing is more resilient than failing fast. This handles transient RPC issues gracefully while waiting for the first message.
283-297: Main loop retry logic looks solid.The flow makes sense - retry multiple times, bail out if all attempts fail, otherwise proceed with checkpoint submitters. Proper separation of concerns here.
37-38: Consistency verified—all good.The constant value of 10 for
CURSOR_INSTANTIATION_ATTEMPTSis already consistent across all three agents: validator, relayer, and scraper. They're all singing the same tune, so the retry behavior will work the same way everywhere.rust/main/agents/scraper/src/agent.rs (1)
6-10: Imports look goodBrings in the right sleep and retry duration for the job. Clean and tidy, like a well‑kept bog.
As per coding guidelines.
- also try to run merkle tree hook sync multiple times for validator - if there are failures, set critical error - if success, clear critical error
fc0dec6 to
106a3d1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
rust/main/agents/validator/src/validator.rs (1)
308-337: The error message has been fixed, mate.The past review correctly pointed out that the error message was wrong ("Failed to scrape chain"), and you've updated it to "Failed to initialize merkle tree hook sync after {} attempts" which properly describes what's happening in the validator.
The retry logic itself looks good - sets the critical error flag when things go sideways, clears it when they don't, and logs each attempt with proper context.
🧹 Nitpick comments (3)
rust/main/agents/validator/src/validator.rs (1)
269-272: Consider using RPC_RETRY_SLEEP_DURATION for consistency.When the count retrieval fails, you're sleeping with
self.interval, but the retry helper on line 325 usesRPC_RETRY_SLEEP_DURATION. If the count failure is RPC-related, using the same retry duration would keep things consistent and might get you back in action faster.Apply this diff if you want to align the retry behavior:
- sleep(self.interval).await; + sleep(RPC_RETRY_SLEEP_DURATION).await;rust/main/agents/scraper/src/agent.rs (2)
176-176: Consider enriching the error message with context.The final error message could include the domain name and number of attempts to make logs easier to follow when things go sideways.
Apply this diff:
- Err(eyre::eyre!("Failed to scrape chain")) + Err(eyre::eyre!( + "Failed to scrape chain {} after {} attempts", + scraper.domain.name(), + attempts + ))
164-164: Attempt count could start at 1 for clarity.Starting at 0 is fine for the loop counter, but displaying "attempt_count=0" in logs might confuse folks who expect counting to start at 1.
Apply this diff:
- tracing::error!(?err, ?scraper.domain, attempt_count=i, "Failed to scrape chain"); + tracing::error!(?err, ?scraper.domain, attempt_count=i+1, "Failed to scrape chain");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between fc0dec67695adc6e2693b2f91cd5da96311b9583 and 106a3d1.
📒 Files selected for processing (2)
rust/main/agents/scraper/src/agent.rs(4 hunks)rust/main/agents/validator/src/validator.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
rust/main/agents/validator/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain validator agent Rust sources under rust/main/agents/validator
Files:
rust/main/agents/validator/src/validator.rs
rust/main/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Run cargo clippy for Rust code linting
Files:
rust/main/agents/validator/src/validator.rsrust/main/agents/scraper/src/agent.rs
rust/main/agents/scraper/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain scraper agent Rust sources under rust/main/agents/scraper
Files:
rust/main/agents/scraper/src/agent.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (54)
- GitHub Check: infra-test
- GitHub Check: env-test-matrix (mainnet3, inevm, igp)
- GitHub Check: env-test-matrix (mainnet3, optimism, core)
- GitHub Check: cli-evm-e2e-matrix (warp-read)
- GitHub Check: env-test-matrix (mainnet3, optimism, igp)
- GitHub Check: env-test-matrix (mainnet3, ethereum, igp)
- GitHub Check: cli-evm-e2e-matrix (warp-check-1)
- GitHub Check: env-test-matrix (testnet4, sepolia, core)
- GitHub Check: env-test-matrix (mainnet3, inevm, core)
- GitHub Check: cli-evm-e2e-matrix (warp-init)
- GitHub Check: env-test-matrix (mainnet3, arbitrum, igp)
- GitHub Check: cli-evm-e2e-matrix (warp-rebalancer)
- GitHub Check: cli-evm-e2e-matrix (warp-check-3)
- GitHub Check: cli-evm-e2e-matrix (warp-bridge-2)
- GitHub Check: cli-evm-e2e-matrix (warp-send)
- GitHub Check: cli-evm-e2e-matrix (warp-extend-recovery)
- GitHub Check: env-test-matrix (mainnet3, ethereum, core)
- GitHub Check: cli-evm-e2e-matrix (warp-apply-submitters)
- GitHub Check: env-test-matrix (mainnet3, arbitrum, core)
- GitHub Check: cli-evm-e2e-matrix (warp-extend-basic)
- GitHub Check: cli-evm-e2e-matrix (warp-check-2)
- GitHub Check: cli-evm-e2e-matrix (warp-extend-config)
- GitHub Check: cli-evm-e2e-matrix (warp-deploy)
- GitHub Check: cli-evm-e2e-matrix (core-deploy)
- GitHub Check: cli-evm-e2e-matrix (warp-apply-2)
- GitHub Check: cli-evm-e2e-matrix (relay)
- GitHub Check: cli-evm-e2e-matrix (core-apply)
- GitHub Check: cli-evm-e2e-matrix (core-read)
- GitHub Check: cli-evm-e2e-matrix (warp-apply-1)
- GitHub Check: cli-evm-e2e-matrix (warp-bridge-1)
- GitHub Check: cli-evm-e2e-matrix (core-init)
- GitHub Check: cli-evm-e2e-matrix (core-check)
- GitHub Check: coverage-run
- GitHub Check: cli-cosmos-e2e-matrix (core-read)
- GitHub Check: cli-cosmos-e2e-matrix (core-deploy)
- GitHub Check: cli-cosmos-e2e-matrix (warp-deploy)
- GitHub Check: cli-cross-chain-e2e-matrix (warp-deploy)
- GitHub Check: cli-cosmos-e2e-matrix (warp-read)
- GitHub Check: cli-cosmos-e2e-matrix (core-apply)
- GitHub Check: cosmos-sdk-e2e-run
- GitHub Check: cli-cross-chain-e2e-matrix (warp-apply)
- GitHub Check: cli-cosmos-e2e-matrix (core-check)
- GitHub Check: cli-install-test-run
- GitHub Check: yarn-test-run
- GitHub Check: build-and-push-to-gcr
- GitHub Check: e2e-matrix (starknet)
- GitHub Check: e2e-matrix (cosmosnative)
- GitHub Check: e2e-matrix (sealevel)
- GitHub Check: e2e-matrix (cosmwasm)
- GitHub Check: e2e-matrix (evm)
- GitHub Check: e2e-matrix (radix)
- GitHub Check: lint-rs
- GitHub Check: test-rs
- GitHub Check: lander-coverage
🔇 Additional comments (8)
rust/main/agents/validator/src/validator.rs (4)
24-27: LGTM!The import of
RPC_RETRY_SLEEP_DURATIONis appropriate for the retry logic you've added.
37-38: Good choice making this a constant.Setting it to 10 attempts gives plenty of chances before giving up. Nice and configurable if it needs tweaking later.
283-293: LGTM!The retry invocation looks solid. Failing early when the sync task can't be created is the right call, and the error logging will make debugging easier.
295-297: LGTM!Starting the checkpoint submitters after successfully getting the sync task is the right sequence. Everything's lined up properly.
rust/main/agents/scraper/src/agent.rs (4)
6-10: Imports look solid.The addition of
RPC_RETRY_SLEEP_DURATIONandtime::sleepsupports the retry mechanism properly. Reusing the sleep duration from core keeps things consistent across components.
21-21: Constant is well-defined.Setting this to 10 attempts aligns with the retry pattern described in the PR.
133-144: Retry integration looks good.The call site properly handles success and failure cases. When all retries are exhausted, the chain is skipped and remains flagged with a critical error.
154-177: Look, the sandbox here's got its hands tied when it comes to running cargo—can't get to/proc/self/exe, which means we're stuck like an ogre in a swamp without a bridge, yeah?I could take a stab at reading the code meself, and far as I can tell from looking at it, the function structure seems sound enough—the
sleep()is properly awaited, the error handling makes sense. But here's the thing: I can't actually run clippy to verify it passes all the checks like the guidelines say. That's the real kicker.You'll need to run the clippy check manually on your end:
cd rust/main/agents/scraper && cargo clippy --all-targets --all-features -- -D warningsMake sure it comes back clean, and we're golden.
|
I wonder if it is possible to generalise this function so that we can use it with more types of Results and re-use it in Scraper and Validator? |
I tried doing that, but it was quite difficult for some functionality |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
rust/main/agents/validator/src/validator.rs (1)
310-321: Good switch to the retry helper; message fixed.Nice swap to try_n_times_to_run_merkle_tree_hook_sync, and the final error now names the merkle tree hook sync, not “scrape chain”. Clean as a whistle.
🧹 Nitpick comments (1)
rust/main/agents/validator/src/validator.rs (1)
40-40: Make attempts configurable and de‑duplicate across agents.Hard‑coding 10 is a wee bit rigid. Consider wiring this from settings/env and/or centralizing alongside RPC_RETRY_SLEEP_DURATION so Scraper and Validator don’t drift over time.
Want me to propose a small config plumb (defaulting to 10) used by both agents?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
rust/main/agents/validator/src/validator.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
rust/main/agents/validator/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain validator agent Rust sources under rust/main/agents/validator
Files:
rust/main/agents/validator/src/validator.rs
rust/main/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Run cargo clippy for Rust code linting
Files:
rust/main/agents/validator/src/validator.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: infra-test
- GitHub Check: build-and-push-to-gcr
- GitHub Check: lint-rs
- GitHub Check: lander-coverage
- GitHub Check: test-rs
- GitHub Check: e2e-matrix (cosmwasm)
- GitHub Check: e2e-matrix (sealevel)
- GitHub Check: e2e-matrix (starknet)
- GitHub Check: e2e-matrix (radix)
- GitHub Check: e2e-matrix (evm)
- GitHub Check: e2e-matrix (cosmosnative)
🔇 Additional comments (3)
rust/main/agents/validator/src/validator.rs (3)
25-27: Imports look tidy; sleep constant fits the use.Pulling in RPC_RETRY_SLEEP_DURATION here makes sense for consistency with other agents. No fuss from me.
322-324: Order of startup makes sense.Gate checkpoint submitters on successful merkle-tree sync init. Sensible sequencing.
310-318: Validation successful—no scraper agent remnants detected in validator code.The ripgrep search came up clean for the validator agent. All the matches landed in the scraper agent where they belong—no messy copy/paste bits cluttering up your validator sources. The error messaging at those lines uses the proper "merkle tree hook sync" terminology, not any scraper-specific phrasing. Your ogre... er, your code's layers are properly separated.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
rust/main/agents/scraper/src/agent.rs (1)
160-167: Skip the sleep after the final attempt.Right now we're waiting through the sleep duration even when we've run out of attempts. That's like waiting at the door after you've already been told the swamp's closed.
Apply this diff to only sleep when there's another retry coming:
for i in 0..attempts { let scraper_task = match self.scrape(scraper).await { Ok(s) => s, Err(err) => { tracing::error!(?err, ?scraper.domain, attempt_count=i, "Failed to scrape chain"); - sleep(RPC_RETRY_SLEEP_DURATION).await; + if i + 1 < attempts { + sleep(RPC_RETRY_SLEEP_DURATION).await; + } continue; } };
🧹 Nitpick comments (4)
rust/main/agents/scraper/src/agent.rs (4)
21-21: Consider documenting why 10 attempts.The constant value works fine, but a comment explaining the reasoning (e.g., expected network conditions, typical recovery time) would help future maintainers understand if this needs tuning.
164-164: Consider 1-indexed attempt count in logs.The log shows
attempt_count=iwhere i is 0-indexed. Showingattempt_count=i+1or something likeattempt={}/{}, i+1, attempts` would be clearer for anyone reading the logs - they'd see "attempt 1 of 10" rather than "attempt 0".- tracing::error!(?err, ?scraper.domain, attempt_count=i, "Failed to scrape chain"); + tracing::error!(?err, ?scraper.domain, attempt_count=i+1, total_attempts=attempts, "Failed to scrape chain");
174-176: Preserve the last error for better debugging.The generic error message loses the underlying cause. Consider storing the last error from the loop and including it in the final result, or at least adding the domain name to the error message.
+ let mut last_err = None; for i in 0..attempts { let scraper_task = match self.scrape(scraper).await { Ok(s) => s, Err(err) => { tracing::error!(?err, ?scraper.domain, attempt_count=i, "Failed to scrape chain"); + last_err = Some(err); if i + 1 < attempts { sleep(RPC_RETRY_SLEEP_DURATION).await; } continue; } }; self.chain_metrics .set_critical_error(scraper.domain.name(), false); return Ok(scraper_task); } self.chain_metrics .set_critical_error(scraper.domain.name(), true); - Err(eyre::eyre!("Failed to scrape chain")) + Err(last_err.unwrap_or_else(|| eyre::eyre!("Failed to scrape chain after {} attempts", attempts)))
154-159: Documentation could mention error flag behavior.The doc comment is clear about the retry logic, but it might help to note that this also manages the critical_error metric (sets on total failure, clears on success).
- /// Try to scrape attempts times before giving up. + /// Try to scrape attempts times before giving up. + /// Sets the chain's critical_error metric on total failure, clears it on success. async fn try_n_times_to_scrape(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
rust/main/agents/scraper/src/agent.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
rust/main/agents/scraper/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain scraper agent Rust sources under rust/main/agents/scraper
Files:
rust/main/agents/scraper/src/agent.rs
rust/main/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Run cargo clippy for Rust code linting
Files:
rust/main/agents/scraper/src/agent.rs
🧠 Learnings (1)
📚 Learning: 2025-09-02T18:44:06.598Z
Learnt from: CR
Repo: hyperlane-xyz/hyperlane-monorepo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-02T18:44:06.598Z
Learning: Applies to rust/main/agents/scraper/**/src/**/*.rs : Maintain scraper agent Rust sources under rust/main/agents/scraper
Applied to files:
rust/main/agents/scraper/src/agent.rs
🔇 Additional comments (3)
rust/main/agents/scraper/src/agent.rs (3)
133-142: Good refactoring of retry logic.Moving the retry mechanism into a dedicated method makes the code cleaner and the run loop easier to follow.
170-172: Proper error recovery on success.Clearing the critical error flag when scraping succeeds ensures the system recovers from transient failures, which is exactly what we want.
154-177: Verify the code passes clippy linting.Make sure to run
cargo clippyon this code to catch any lints or warnings.As per coding guidelines.
Description
Related issues
Summary by CodeRabbit