Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Relay API (Axum handlers, extractor, metrics) with rate‑limiting and tx deduplication, integrates it into the relayer/server wiring, extends Indexer trait with parse_tx_hash plus chain-specific parsers/tests, updates cargo/workspace deps, adds relay benchmark script and local config. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Handler as RelayHandler
participant RL as RateLimiter
participant Cache as TxHashCache
participant Extractor as Extractor
participant Indexer as ChainIndexer
participant Classifier as ISMClassifier
participant DB as OriginDB
participant Queue as SendQueue
Client->>Handler: POST /relay (origin_chain, tx_hash)
Handler->>RL: check()
RL-->>Handler: allowed/denied
Handler->>Cache: check_and_insert(chain, tx_hash)
Cache-->>Handler: ok/duplicate
Handler->>Extractor: extract_messages(indexers, chain, tx_hash)
Extractor->>Indexer: parse_tx_hash(tx_hash)
Indexer-->>Extractor: H512
Extractor->>Indexer: fetch_logs(parsed_hash)
Indexer-->>Extractor: [HyperlaneMessage]
Extractor-->>Handler: ExtractedMessage[]
Handler->>Classifier: classify(recipient)
Classifier-->>Handler: MessageContext
Handler->>DB: persist message + mappings
DB-->>Handler: ok
Handler->>Queue: enqueue PendingMessage (dest_domain)
Queue-->>Handler: queued
Handler-->>Client: 200 OK (messages info)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8364 +/- ##
=======================================
Coverage 76.49% 76.49%
=======================================
Files 128 128
Lines 3416 3416
Branches 290 290
=======================================
Hits 2613 2613
Misses 786 786
Partials 17 17
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rust/main/agents/relayer/src/relay_api/handlers.rs (1)
208-240: Rate limiter works, but there's room in me swamp for a wee improvement.The sliding window approach is functionally correct. Using
Vecwithretain()is O(n) on each check. For higher throughput scenarios,VecDequewith front-popping would be more efficient, though this is likely fine for expected API load.♻️ Optional: Use VecDeque for O(1) cleanup
+use std::collections::VecDeque; + pub struct RateLimiter { - requests: Vec<u64>, + requests: VecDeque<u64>, max_requests: usize, window_secs: u64, } impl RateLimiter { pub fn new(max_requests: usize, window_secs: u64) -> Self { Self { - requests: Vec::new(), + requests: VecDeque::new(), max_requests, window_secs, } } pub fn check(&mut self) -> bool { let now = SystemTime::now() .duration_since(UNIX_EPOCH) .unwrap() .as_secs(); - // Remove requests outside the window - self.requests.retain(|&t| now - t < self.window_secs); + // Remove requests outside the window (from front) + while let Some(&oldest) = self.requests.front() { + if now - oldest >= self.window_secs { + self.requests.pop_front(); + } else { + break; + } + } if self.requests.len() >= self.max_requests { return false; } - self.requests.push(now); + self.requests.push_back(now); true } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 208 - 240, Replace the Vec-based timestamp storage in struct RateLimiter with a VecDeque<u64> to avoid O(n) retain() costs; update RateLimiter::new to initialize requests as a VecDeque, and change RateLimiter::check to remove expired timestamps by popping from the front in a loop (or while-let) until the front timestamp is within window_secs, then check length against max_requests and push_back(now) when accepting a request; ensure you update imports (use std::collections::VecDeque) and adjust references to requests.push and requests.retain to use push_back and front/pop_front semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@rust/main/agents/relayer/src/relay_api/handlers.rs`:
- Around line 208-240: Replace the Vec-based timestamp storage in struct
RateLimiter with a VecDeque<u64> to avoid O(n) retain() costs; update
RateLimiter::new to initialize requests as a VecDeque, and change
RateLimiter::check to remove expired timestamps by popping from the front in a
loop (or while-let) until the front timestamp is within window_secs, then check
length against max_requests and push_back(now) when accepting a request; ensure
you update imports (use std::collections::VecDeque) and adjust references to
requests.push and requests.retain to use push_back and front/pop_front
semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c3087b0f-fc3b-4ecf-b066-2258e88baaea
📒 Files selected for processing (1)
rust/main/agents/relayer/src/relay_api/handlers.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
rust/main/agents/relayer/src/relay_api/handlers.rs (1)
329-334:⚠️ Potential issue | 🟠 MajorAll extraction failures become 400 Bad Request - needs differentiation.
Look, not every failed extraction is the client's fault. Right now, if the indexer RPC is down or there's a transient network blip, this returns a 400 like the client did something wrong. That's not fair to them, is it?
Infrastructure failures should return 5xx (e.g.,
ServiceUnavailable), invalid formats should be 400, and "tx not found" could arguably be 404. Collapsing everything into one error type muddles retry semantics for callers.🐛 Suggested approach: distinguish error types
.map_err(|e| { if let Some(ref metrics) = state.metrics { metrics.inc_failure("extraction_failed"); } - ServerError::InvalidRequest(format!("Failed to extract messages: {e}")) + // Distinguish between client errors and server errors + let err_str = e.to_string(); + if err_str.contains("not found") || err_str.contains("no messages") { + ServerError::NotFound + } else if err_str.contains("invalid") || err_str.contains("parse") { + ServerError::InvalidRequest(format!("Invalid request: {e}")) + } else { + ServerError::ServiceUnavailable(format!("Extraction service error: {e}")) + } })?;Alternatively, have
extract_messagesreturn a typed error enum that maps cleanly to HTTP status codes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 329 - 334, The current map_err in the call that uses extract_messages collapses every failure into ServerError::InvalidRequest (400); change this to distinguish error kinds by updating extract_messages (or the call site) to return/propagate a typed error enum (e.g., ExtractError { InvalidInput, NotFound, RemoteUnavailable, Transient }) or match on existing error variants from the indexer client; then map InvalidInput -> ServerError::InvalidRequest, NotFound -> ServerError::NotFound (404), and RemoteUnavailable/Transient -> ServerError::ServiceUnavailable (5xx) while still incrementing metrics via state.metrics.inc_failure with appropriate labels (e.g., "extraction_invalid", "extraction_not_found", "extraction_unavailable") so callers see correct HTTP semantics and retry signals.
🧹 Nitpick comments (2)
rust/main/agents/relayer/src/relay_api/handlers.rs (2)
449-457: Error message could be clearer for unsupported destinations.The error "No send channel for destination domain" sounds like an internal plumbing problem rather than "this relayer doesn't support that route." A user-friendly message would help callers understand what went wrong without them having to dig through swamp logs.
✨ Suggested clearer error message
let send_channel = send_channels .get(&extracted.destination_domain) .ok_or_else(|| { ServerError::InvalidRequest(format!( - "No send channel for destination domain {}", + "Destination domain {} is not supported by this relayer", extracted.destination_domain )) })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 449 - 457, The current ServerError produced when send_channels.get(&extracted.destination_domain) is missing uses a message that sounds like an internal plumbing error; change the message in the ServerError::InvalidRequest to a user-friendly one indicating the destination is unsupported (e.g., "Unsupported destination domain" or "Relayer does not support destination domain") and include the extracted.destination_domain value so callers clearly know the route is not supported; update the error construction in the lookup around send_channels, preserving ServerError::InvalidRequest.
281-289: Consider validatingtx_hashfor empty string too.You're checking
origin_chainisn't empty, but an emptytx_hashwould slip through and fail later in extraction with a less helpful error message. Might as well catch it early.✨ Suggested validation addition
// Validate request if req.origin_chain.is_empty() { if let Some(ref metrics) = state.metrics { metrics.inc_failure("invalid_request"); } return Err(ServerError::InvalidRequest( "origin_chain cannot be empty".to_string(), )); } + + if req.tx_hash.is_empty() { + if let Some(ref metrics) = state.metrics { + metrics.inc_failure("invalid_request"); + } + return Err(ServerError::InvalidRequest( + "tx_hash cannot be empty".to_string(), + )); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 281 - 289, Add the same early validation for req.tx_hash as exists for req.origin_chain: check if req.tx_hash.is_empty(), increment state.metrics.inc_failure("invalid_request") when metrics is Some, and return Err(ServerError::InvalidRequest("tx_hash cannot be empty".to_string())); place this alongside the existing origin_chain check in the handler so empty tx_hash is rejected with a clear error before extraction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@rust/main/agents/relayer/src/relay_api/handlers.rs`:
- Around line 329-334: The current map_err in the call that uses
extract_messages collapses every failure into ServerError::InvalidRequest (400);
change this to distinguish error kinds by updating extract_messages (or the call
site) to return/propagate a typed error enum (e.g., ExtractError { InvalidInput,
NotFound, RemoteUnavailable, Transient }) or match on existing error variants
from the indexer client; then map InvalidInput -> ServerError::InvalidRequest,
NotFound -> ServerError::NotFound (404), and RemoteUnavailable/Transient ->
ServerError::ServiceUnavailable (5xx) while still incrementing metrics via
state.metrics.inc_failure with appropriate labels (e.g., "extraction_invalid",
"extraction_not_found", "extraction_unavailable") so callers see correct HTTP
semantics and retry signals.
---
Nitpick comments:
In `@rust/main/agents/relayer/src/relay_api/handlers.rs`:
- Around line 449-457: The current ServerError produced when
send_channels.get(&extracted.destination_domain) is missing uses a message that
sounds like an internal plumbing error; change the message in the
ServerError::InvalidRequest to a user-friendly one indicating the destination is
unsupported (e.g., "Unsupported destination domain" or "Relayer does not support
destination domain") and include the extracted.destination_domain value so
callers clearly know the route is not supported; update the error construction
in the lookup around send_channels, preserving ServerError::InvalidRequest.
- Around line 281-289: Add the same early validation for req.tx_hash as exists
for req.origin_chain: check if req.tx_hash.is_empty(), increment
state.metrics.inc_failure("invalid_request") when metrics is Some, and return
Err(ServerError::InvalidRequest("tx_hash cannot be empty".to_string())); place
this alongside the existing origin_chain check in the handler so empty tx_hash
is rejected with a clear error before extraction.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6a035b9f-cbc1-475b-82be-fe4e0ea514c5
📒 Files selected for processing (2)
rust/main/agents/relayer/src/relay_api/handlers.rsrust/main/agents/relayer/src/relayer/tests.rs
85f0458 to
1a94428
Compare
paulbalaji
left a comment
There was a problem hiding this comment.
Re-review on d68b2b6
Prior Devin/CodeRabbit threads are all resolved — nice progress. Three findings remain before approval (one blocker), plus two broader notes.
Good catches to credit:
- Two-pass validate-then-commit refactor in the handler
- Explicit EVM-only scope narrowing with the clear comment at
rust/main/hyperlane-core/src/traits/indexer.rs:78-89 - HTTP 404 transient detection replacing the earlier substring matching in
ccip_read fail_fastmode opt-in, so normal relayer behavior is unchanged
Non-inline notes
Medium — POST /relay has no authentication, and the rate limiter is a single process-global bucket (100/60s). The CCTP-V2 gate narrows the blast radius but each accepted request still drives RPC work on the destination mailbox and Circle's attestation service. Consider an API-key header, per-IP limits, or binding to a private interface in production. Worth documenting the intended deployment model.
Medium — the relay_api module has no unit tests. 655 LoC of handler logic: validate-then-commit, TxHashCache behavior under full/duplicate/ok, rate-limit window semantics, CCTP-V2 gate rejection, fail-fast retry budget, partial-send handling, and the CORS parser. All untested. Given the security-facing nature of this endpoint, coverage would be worth adding before the first deploy.
Net: not blocking on style. Finding #1 (partial-commit + dedup lockout) is the one to resolve before this lands.
paulbalaji
left a comment
There was a problem hiding this comment.
Consolidated re-review (commit dddf9aea07)
Status: requesting changes. Two high-severity items + three mediums below. This is a follow-up re-review that accounts for prior devin/coderabbit/author exchanges — items already resolved (Box::leak, RwLock poisoning, DB-before-send, ISM-outside-timeout, rate-limit zero validation, empty tx hash, dedup-before-extraction, now−t clock-skew underflow, partial-send duplication, multi-message extraction) are not re-raised, and the devin parse_tx_hash H512 padding concern is independently verified to be a false positive (hyperlane-core's impl_fixed_hash_conversions!(H512, H256) in hyperlane-core/src/types/primitive_types.rs reads bottom-aligned bytes [32..64], which matches the left-pad; the conversion round-trips correctly for EVM hashes). Kudos to the author for catching that.
High severity — blockers
fetch_raw_logs_and_metasignature flip (Err→Ok(None)on missing receipt) regressed retry semantics for all EVM tx-id indexing, not just/relay. See inline oncontracts/utils.rs.TxHashCacheis a read-then-write TOCTOU: concurrent identical requests all pass the earlycontains()read-lock, amplify phase-1 RPCs, and enqueue duplicatePendingMessages before the write-lock commit. See inline onrelay_api/handlers.rs:610.
Medium
- CCTP V2 gate is per-tx, not per-message — a tx mixing
DepositForBurn(V2)with an unrelatedDispatchfast-paths the unrelated one. Inline onrelay_api/extractor.rs. - TS SDK parity gap: three new
relayApi*settings keys are absent fromtypescript/sdk/src/metadata/agentConfig.tsandtypescript/infra/src/config/agent/relayer.ts, despite the parity contract at the top ofsettings/mod.rs. Inline onsettings/mod.rs. - No handler-level tests for
/relay. The coverage inrelayer/tests.rsonly exercises config plumbing — dedupe race, rate limit, CCTP gating, cache-full-after-send, concurrent same-tx, and partial multi-destination send are uncovered. Inline onrelay_api/handlers.rs:327.
PR body is stale
The description still advertises Solana/Radix/Aleo push-relay support, but the diff ships no chain-specific parse_*_tx_hash functions and /relay is gated to EVM CCTP V2 via is_cctp_v2 in extractor.rs. Please update the description so reviewers and deployers aren't misled.
Observations outside this PR
rust/main/hyperlane-base/src/contract_sync/mod.rs:216-220is the caller that silently drops tx_ids on the newOk(None)path above — this is the blast radius beyond/relayfor High #1.- Rate limiter is a single global
RwLock<RateLimiter>with no per-IP / per-key dimension; combined with the TOCTOU above, an attacker can amplify per-request RPC cost ~22× (2 + 10×recipient_ism+ 10×get_app_context) within the global quota. Not a blocker on its own; consider tightening defaults and parallelizing the per-message RPCs withjoin_all.
paulbalaji
left a comment
There was a problem hiding this comment.
Delta re-review — HEAD 7e9d7dad88
Follow-up to #8364 (review). I re-read every threaded author reply since that review; nothing currently ends on an unaddressed author response except the two Mediums carried forward below. Non-blocking comment review — a few items worth a second look before merge.
Resolved since the last pass ✅
- High 1 (
fetch_raw_logs_and_metaretry regression): fixed.contracts/utils.rsstill returnsOk(None), butmailbox.rs,interchain_gas.rs, andmerkle_tree_hook.rsnow convertNone→Errinsidecall_and_retry_indefinitely, restoring pre-PR retry semantics for the sharedcontract_syncpath. - High 2 (TxHashCache TOCTOU): fixed.
handlers.rs:395-412reserves the dedup slot under a single write lock before any RPC / send / DB work, withremove()rollback on the error path.test_concurrent_same_tx_only_one_succeedsexercises the race. - Medium 5 (no
/relaytests): largely addressed — 11 tests inrelay_api/tests.rscover happy path, dedup within/after TTL, concurrency, rate limit, timeout, CCTP gate (happy), whitelist/blacklist, cache-full-before-send, partial-send release. - Acknowledged author replies: the H512 padding dispute was a false positive,
max_retries = 3is documented, the "does not apply here" reply onutils.rs:30is now accurate because the retry wrapper is in the callers. 7e9d7dad88cleanly reuses the already-built message indexer instead of standing up a second one (relayer/origin.rs,relayer.rs:644-648).
Still open — please address
- Test vs. code disagree on extraction-timeout status — new test expects 408, current code returns 400. See inline on
extractor.rs:45andtests.rs:357. - CCTP V2 gate is per-tx, not per-message — acknowledged the author's Iris-attestation reasoning, but it addresses spoofed burns, not the mixed-message case. See inline on
extractor.rs:96. - TS SDK parity regressed — four Rust-side keys now lack a TS mirror despite the parity contract. See inline on
settings/mod.rs:77. - Reservation leak on handler cancellation — minor / defense-in-depth: rollback only runs if
relay_workreturns. See inline onhandlers.rs:415.
PR description is still stale
The summary still claims push-relay support for Solana, Radix, and Aleo, but the current diff ships no chain-specific parse_*_tx_hash and /relay is CCTP-V2-gated to EVM via extract_messages. Please update so downstream consumers aren't misled.
Observations outside this PR (not blocking)
hyperlane-base/src/server/base_server.rs:27-49uses plainaxum::servewith notower::timeout/GracefulShutdownlayer today, which is partly why the cancellation leak above is not an immediate exploit. If that wrapper is ever added, the reservation-leak becomes real.rate_limiter.check()athandlers.rsconsumes a token before the dedup reservation, so duplicate requests within TTL still burn rate-limit quota. Minor; not a regression.
CI failure — same issue flagged inline in my last reviewThe failing job on run 24719806500 is: This is the exact mismatch flagged inline on Minimal fixSurface the inner-timeout case distinctly so the handler can map it to 1. use thiserror::Error;
#[derive(Debug, Error)]
pub enum ExtractError {
#[error("timed out fetching transaction receipt")]
Timeout,
#[error(transparent)]
Other(#[from] eyre::Error),
}
pub async fn extract_messages(
indexers: &HashMap<String, Arc<dyn Indexer<HyperlaneMessage>>>,
chain_name: &str,
tx_hash: &str,
) -> Result<Vec<ExtractedMessage>, ExtractError> {
// ...
let messages_with_meta = tokio::time::timeout(
Duration::from_secs(5),
indexer.fetch_logs_by_tx_hash(tx_hash_512),
)
.await
.map_err(|_| {
error!(chain = %chain_name, tx_hash = %tx_hash, "Timed out waiting for transaction receipt");
ExtractError::Timeout
})?
.map_err(|e| ExtractError::Other(eyre!("Failed to fetch transaction logs: {}", e)))?;
// ...
}Also re-export the new error from 2. .map_err(|e| match e {
ExtractError::Timeout => {
state.record_failure("timeout");
ServerError::RequestTimeout
}
ExtractError::Other(err) => {
state.record_failure("extraction_failed");
ServerError::InvalidRequest(format!("Failed to extract messages: {err}"))
}
})?;The outer Alternative (one-line fix)If you'd rather not add an error type, drop the 5 s inner timeout entirely and rely on the 10 s outer timeout. The tradeoff is exactly what the comment at Not re-posting the four items from the consolidated review above — they still stand as-is after this fix. |
| if status == reqwest::StatusCode::NOT_FOUND { | ||
| return Err(MetadataBuildError::AttestationPending); | ||
| } |
There was a problem hiding this comment.
🚩 HTTP 404 universally treated as AttestationPending for all CCIP read URLs
At rust/main/agents/relayer/src/msg/metadata/ccip_read/mod.rs:369-371, any HTTP 404 response from any CCIP read URL triggers AttestationPending, causing up to 30 retries at 1-second intervals. The CCIP URLs come from on-chain ISM contracts and could point to any server. While the comment explains this is designed for Circle's attestation service, a misconfigured or non-Circle URL that legitimately returns 404 (e.g., wrong path) will waste ~30 seconds per URL before moving to the next. This is bounded and non-critical, but could be improved by making the 404→pending mapping configurable or checking a response header/body marker specific to Circle's API.
Was this helpful? React with 👍 or 👎 to provide feedback.
🦀 Rust Agent Docker Image Built Successfully
Full image paths |
| let is_cctp_v2 = indexer.is_cctp_v2(tx_hash_512).await.unwrap_or_else(|e| { | ||
| warn!( | ||
| chain = %chain_name, | ||
| tx_hash = %tx_hash, | ||
| error = ?e, | ||
| "Failed to check for CCTP V2 burn event, treating as non-CCTP" | ||
| ); | ||
| false | ||
| }); |
There was a problem hiding this comment.
🔴 Redundant receipt RPC call in is_cctp_v2 silently rejects valid CCTP V2 messages on transient failures
The extract_messages function in extractor.rs:47-68 first calls indexer.fetch_logs_by_tx_hash(tx_hash_512) which internally calls get_transaction_receipt (via call_and_retry_indefinitely). After successfully extracting messages, it then calls indexer.is_cctp_v2(tx_hash_512) at line 96, which makes a second, independent get_transaction_receipt RPC call (rust/main/chains/hyperlane-ethereum/src/contracts/mailbox.rs:213-217). If this second call fails due to a transient RPC error, rate-limiting, or load-balancer routing to a different node, the unwrap_or_else on line 96 silently converts the error to false. The relay API then rejects the valid CCTP V2 message with a misleading "Only EVM CCTP V2 messages are supported" error. This is both a correctness issue (valid messages rejected) and a performance issue (double RPC call for the same receipt).
Prompt for agents
The extract_messages function in extractor.rs makes two separate get_transaction_receipt RPC calls for the same tx hash: once inside fetch_logs_by_tx_hash (line 49) and again inside is_cctp_v2 (line 96). When the second call fails transiently, the unwrap_or_else converts the error to false, causing valid CCTP V2 messages to be rejected with a misleading error.
The fix should combine the receipt fetch into a single RPC call. Options:
1. Add a new method to the Indexer trait (e.g., fetch_logs_and_check_cctp_v2) that returns both the logs and the CCTP V2 status from a single receipt.
2. Have fetch_logs_by_tx_hash return the raw receipt alongside the parsed logs, so is_cctp_v2 can inspect the already-fetched receipt.
3. Add a variant of is_cctp_v2 that accepts a receipt directly instead of a tx hash.
Alternatively, if keeping two calls, the is_cctp_v2 error should be propagated as ExtractError::Failed rather than silently treated as false. A transient RPC failure should not be treated as a definitive non-CCTP-V2 determination.
Relevant files: rust/main/agents/relayer/src/relay_api/extractor.rs (lines 47-104), rust/main/chains/hyperlane-ethereum/src/contracts/mailbox.rs (is_cctp_v2 method at line 202), rust/main/hyperlane-core/src/traits/indexer.rs (is_cctp_v2 trait default at line 88).
Was this helpful? React with 👍 or 👎 to provide feedback.
| .route("/relay", post(create_relay)) | ||
| .layer(cors) | ||
| .with_state(self) | ||
| } |
There was a problem hiding this comment.
🚩 No authentication on relay API endpoint — mitigated by rate limiter, dedup cache, and CCTP V2 gate
The /relay endpoint at handlers.rs:277 has no authentication mechanism. Any client that can reach the server can submit relay requests. The risk is mitigated by: (1) the rate limiter (default 100 req/60s), (2) the dedup TxHashCache (prevents replay within TTL), (3) the CCTP V2 check (only genuine Circle DepositForBurn transactions are accepted), and (4) whitelist/blacklist filtering. Since Circle's attestation service is the ultimate gate (without a genuine attestation, the message can't be delivered), an attacker can cause wasted RPC calls but not actual fund theft. Still, the lack of authentication should be documented and potentially addressed if the relay API is exposed to the public internet.
Was this helpful? React with 👍 or 👎 to provide feedback.
paulbalaji
left a comment
There was a problem hiding this comment.
Delta re-review — HEAD 5386d585df
Follow-up to #8364 (review). Re-read all 32 troykessler threaded replies (32 total in-tree). Non-blocking comment review, but two items still need attention before merge — plus a CI fix.
Resolved since last pass ✅
test_extraction_timeout408 vs 400: fixed via typedExtractError::{Timeout,Failed}enum inextractor.rs; handler inhandlers.rs:492-501mapsTimeout → ServerError::RequestTimeout(408). Author confirmedpass nowat 2026-04-21T11:56Z.- Reservation leak on handler cancellation: fixed via
TxHashReservationRAII guard athandlers.rs:112-155, with switch fromtokio::sync::RwLock<TxHashCache>→parking_lot::Mutex<TxHashCache>.Droppath releases on cancel;commit()consumesselfon success. Clean. - TS SDK parity: author replied 2026-04-21T11:57Z on
settings/mod.rs:77withwe dont plan on implementing the relayer api in the ts relayer. Accepted — downgrading to nit below (just update the stale parity-contract comment). - Prior Highs (
fetch_raw_logs_and_metaretry regression,TxHashCacheTOCTOU): still resolved on this head.
Still open — please address
- 🔴 CI lint — 3 clippy errors in
extractor.rsblocktest-rs-run. Inline below; trivial fix. - 🔴 Double-RPC in
is_cctp_v2silently rejects valid CCTP V2 messages on transient RPC blip (flagged by @devin-ai-integration at 2026-04-22T10:01Z, independently confirmed). Author's olderwe dont carereply at 2026-04-20T14:11Z onextractor.rs:96predates this double-RPC finding and the current misleading-400 path. Inline below. - 🟠 Second eviction pass added in
5386d585dfis a no-op. Inline below. - Nit:
settings/mod.rs:5still says "ANY CHANGES HERE NEED TO BE REFLECTED IN THE TYPESCRIPT SDK" — given the author's "not implementing in TS relayer" stance, the comment is now stale. Inline below.
CI (lint failure)
Run 24771827307 fails with 3 clippy uninlined_format_args errors, all in rust/main/agents/relayer/src/relay_api/extractor.rs. Fix recipe in the inline. No other clippy/rustfmt regressions observed.
Overall
Not blocking structurally — most work is done well — but the double-RPC silent-false and the CI lint failure should close before merge. Not re-raising the withdrawn per-tx CCTP gate thread; author has explicitly skipped it.
| ) -> Result<Vec<ExtractedMessage>, ExtractError> { | ||
| // Get indexer for chain | ||
| let indexer = indexers.get(chain_name).ok_or_else(|| { | ||
| ExtractError::Failed(format!("Chain not found in registry: {}", chain_name)) |
There was a problem hiding this comment.
[CI blocker] Three clippy uninlined_format_args errors in this file are blocking test-rs-run on CI run 24771827307:
error: variables can be used directly in the `format!` string
--> agents/relayer/src/relay_api/extractor.rs:28:30
--> agents/relayer/src/relay_api/extractor.rs:40:43
--> agents/relayer/src/relay_api/extractor.rs:67:30
Fix (three 1-line edits in this file):
-ExtractError::Failed(format!("Chain not found in registry: {}", chain_name))
+ExtractError::Failed(format!("Chain not found in registry: {chain_name}"))
-.map_err(|e| ExtractError::Failed(format!("Invalid tx hash format: {}", e)))?;
+.map_err(|e| ExtractError::Failed(format!("Invalid tx hash format: {e}")))?;
-ExtractError::Failed(format!("Failed to fetch transaction logs: {}", e))
+ExtractError::Failed(format!("Failed to fetch transaction logs: {e}"))cargo clippy -p relayer --all-targets --features aleo,integration_test -- -D warnings should then pass locally. (cargo fix --lib -p relayer --tests applies all three automatically.)
|
|
||
| // Check once per tx whether this is a CCTP fast transfer. | ||
| // Errors are treated as false — the relay API will reject the request below. | ||
| let is_cctp_v2 = indexer.is_cctp_v2(tx_hash_512).await.unwrap_or_else(|e| { |
There was a problem hiding this comment.
[High] The is_cctp_v2 call here performs a second, unretried get_transaction_receipt RPC against origin chain. Trace:
- L49
indexer.fetch_logs_by_tx_hash(tx_hash_512)→mailbox.rs:176-189→call_and_retry_indefinitelyaroundfetch_raw_logs_and_meta→ oneget_transaction_receipt. Retried on failure. - L96
indexer.is_cctp_v2(tx_hash_512)→mailbox.rs:213-217→ directself.provider.get_transaction_receipt(...). No retry wrapper, no timeout. - On transient failure,
unwrap_or_else(|_| false)here (L96-104) silently converts the error tofalse→handlers.rs:555returnsServerError::InvalidRequest("Only EVM CCTP V2 messages are supported via the relay API")→ HTTP 400.
The 400 is actively misleading: it tells a client the tx is ineligible when it actually is CCTP V2 — the second receipt fetch just flaked (common with per-node LB cache inconsistency on busy chains). Clients see "not CCTP V2" and don't retry.
Your older 2026-04-20 14:11Z reply on this thread (we dont care since this will be picked up a few seconds later by the relayer contract indexer) predates this double-RPC finding — the indexer recovery path doesn't fix the misleading response body or the dedup-cache reservation that the client doesn't know to retry past.
Fix options, best to minimum:
- Best: extend the indexer trait so
fetch_logs_by_tx_hashreturns logs + CCTP flag from a single receipt fetch. Saves the second RPC on every request, not just error paths. - Minimum: propagate
is_cctp_v2error asExtractError::Failed(or a newExtractError::Transient→ 503) instead of.unwrap_or_else(|_| false). Eliminates the misleading 400. - Wrap the inner
get_transaction_receiptinmailbox.rs::is_cctp_v2withcall_and_retry_indefinitely, matching the other call site.
| } | ||
| } | ||
|
|
||
| // Enforce max size — run eviction again if still at capacity |
There was a problem hiding this comment.
[Low] The second eviction pass added in 5386d585df is a no-op.
- First pass (L86-99) runs when
self.cache.len() > self.max_entries * 3 / 4 - Second pass (L100-105) runs when
self.cache.len() >= self.max_entries
For any max_entries >= 1, whenever the second condition is true the first already ran, and both use the same now, same ttl, same retain predicate — retain is idempotent, so the second pass evicts exactly nothing the first didn't.
If the intent was to catch entries that expired between the two checks, now must be refreshed before the second pass:
// Enforce max size — run eviction again if still at capacity
if self.cache.len() >= self.max_entries {
+ let now = Instant::now();
let ttl = self.ttl;
self.cache
.retain(|_, &mut timestamp| now.duration_since(timestamp) < ttl);
}In practice the time delta between the two passes is microseconds, so even with the refresh it evicts ~nothing. Cleanest is to delete the added block; the existing 75%-threshold pass already handles it.
Summary
Added protocol-specific transaction hash parsing for the relayer API, enabling push-based message relaying for faster cross-chain message delivery.
Motivation & Benefits
Faster Message Delivery (Push vs Poll)
Before (Poll-based):
After (Push-based with this API):
/relaywith their tx hashMulti-Protocol Support Problem
The relayer API accepts tx hashes as strings but previously used generic hex decoding. This failed for non-hex formats:
5Hm...3Bn)txid_rdx1...)at1...)Without protocol-specific parsing, the push-based API was effectively broken for these chains, forcing users to wait for the slow polling mechanism.
Impact
With this change:
Changes
Core Trait Extension
Indexertrait withparse_tx_hash(&str) -> ChainResult<H512>method0xprefix, decodes hex, pads to 64 bytesProtocol-Specific Implementations
Solana (mailbox_indexer.rs):
parse_sealevel_tx_hash()utility functionsolana_sdk::bs58Radix (utils.rs):
parse_radix_tx_hash()utility functiontxid_rdx1...)Aleo (utils.rs):
parse_aleo_tx_hash()utility functionat1...)bech32dependency to Cargo.tomlRelayer API Integration
extract_message()to useindexer.parse_tx_hash()instead of generic hex decodingTesting
✅ All packages compile successfully with clippy
✅ Verified implementations across:
hyperlane-corerelayerhyperlane-sealevelhyperlane-radixhyperlane-aleoProtocol Support Matrix
0x...)0x123...abc0x456...def0x789...ghi0xabc...1235Hm...3Bnparse_sealevel_tx_hash()txid_rdx1...parse_radix_tx_hash()at1...parse_aleo_tx_hash()Example Usage
Design Decisions
Trait default method: Provides hex decoding for 90% of chains without explicit overrides
Utility functions: Avoids duplication between dispatch/delivery indexers
Error handling: Clear error messages for invalid formats (e.g., "Invalid base58 tx hash")
Backward compatible: Existing hex-based chains continue working without changes
Summary by CodeRabbit
New Features
Tests
Configuration