Skip to content

feat: relayer api#8364

Open
troykessler wants to merge 16 commits intomainfrom
feat/relayer-api
Open

feat: relayer api#8364
troykessler wants to merge 16 commits intomainfrom
feat/relayer-api

Conversation

@troykessler
Copy link
Copy Markdown
Contributor

@troykessler troykessler commented Mar 18, 2026

Summary

Added protocol-specific transaction hash parsing for the relayer API, enabling push-based message relaying for faster cross-chain message delivery.

Motivation & Benefits

Faster Message Delivery (Push vs Poll)

Before (Poll-based):

  • Users dispatch a message on the origin chain
  • Wait for relayer to poll and index the block containing their transaction
  • Indexing delay depends on block finality + polling interval (~30s to several minutes)

After (Push-based with this API):

  • Users dispatch a message and immediately call /relay with their tx hash
  • Relayer fetches and processes the message on-demand
  • Near-instant delivery - no waiting for polling cycle

Multi-Protocol Support Problem

The relayer API accepts tx hashes as strings but previously used generic hex decoding. This failed for non-hex formats:

  • Solana: Uses base58 encoding (e.g., 5Hm...3Bn)
  • Radix: Uses bech32m encoding (e.g., txid_rdx1...)
  • Aleo: Uses bech32m encoding (e.g., at1...)

Without protocol-specific parsing, the push-based API was effectively broken for these chains, forcing users to wait for the slow polling mechanism.

Impact

With this change:

  • Push-based relaying now works for all chains (EVM, Cosmos, Solana, Radix, Aleo)
  • Significantly faster message delivery when users call the API
  • Improved UX - users can trigger relaying immediately after dispatching

Changes

Core Trait Extension

  • Extended Indexer trait with parse_tx_hash(&str) -> ChainResult<H512> method
  • Default implementation handles hex-encoded hashes (EVM, Cosmos, Tron, Starknet)
  • Strips 0x prefix, decodes hex, pads to 64 bytes

Protocol-Specific Implementations

Solana (mailbox_indexer.rs):

  • Added parse_sealevel_tx_hash() utility function
  • Decodes base58 signatures using solana_sdk::bs58

Radix (utils.rs):

  • Added parse_radix_tx_hash() utility function
  • Decodes bech32m tx hashes (txid_rdx1...)

Aleo (utils.rs):

  • Added parse_aleo_tx_hash() utility function
  • Decodes bech32m tx hashes (at1...)
  • Added bech32 dependency to Cargo.toml

Relayer API Integration

  • Updated extract_message() to use indexer.parse_tx_hash() instead of generic hex decoding
  • Protocol-specific parsing now happens automatically based on the chain

Testing

✅ All packages compile successfully with clippy
✅ Verified implementations across:

  • hyperlane-core
  • relayer
  • hyperlane-sealevel
  • hyperlane-radix
  • hyperlane-aleo

Protocol Support Matrix

Protocol TX Hash Format Example Implementation
EVM chains Hex (0x...) 0x123...abc Default trait impl
Cosmos Hex 0x456...def Default trait impl
Tron Hex 0x789...ghi Default trait impl
Starknet Hex (felt) 0xabc...123 Default trait impl
Solana Base58 5Hm...3Bn parse_sealevel_tx_hash()
Radix Bech32m txid_rdx1... parse_radix_tx_hash()
Aleo Bech32m at1... parse_aleo_tx_hash()

Example Usage

# EVM - hex with 0x prefix (push-based relay)
curl -X POST /relay \
  -d '{"origin_chain": "ethereum", "tx_hash": "0x1234..."}'

# Solana - base58 (now works with push-based relay!)
curl -X POST /relay \
  -d '{"origin_chain": "sealevel", "tx_hash": "5Hm...3Bn"}'

# Radix - bech32m (now works with push-based relay!)
curl -X POST /relay \
  -d '{"origin_chain": "radix", "tx_hash": "txid_rdx1..."}'

# Aleo - bech32m (now works with push-based relay!)
curl -X POST /relay \
  -d '{"origin_chain": "aleo", "tx_hash": "at1..."}'

Design Decisions

Trait default method: Provides hex decoding for 90% of chains without explicit overrides
Utility functions: Avoids duplication between dispatch/delivery indexers
Error handling: Clear error messages for invalid formats (e.g., "Invalid base58 tx hash")
Backward compatible: Existing hex-based chains continue working without changes


Open with Devin

Summary by CodeRabbit

  • New Features

    • Relay API for extracting/submitting messages with rate limiting, replay protection, metrics and observable responses (enabled by default; can be disabled)
    • Broader chain support and improved transaction-hash parsing for Aleo, Radix, Sealevel and others
    • Relay benchmarking script to measure end-to-end relay times
  • Tests

    • Unit tests for new hash parsing utilities
  • Configuration

    • New relay API rate-limit settings and local config samples

@troykessler troykessler self-assigned this Mar 18, 2026
@github-project-automation github-project-automation Bot moved this to In Review in Hyperlane Tasks Mar 18, 2026
@troykessler troykessler marked this pull request as ready for review March 18, 2026 10:45
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Relay API (Axum handlers, extractor, metrics) with rate‑limiting and tx deduplication, integrates it into the relayer/server wiring, extends Indexer trait with parse_tx_hash plus chain-specific parsers/tests, updates cargo/workspace deps, adds relay benchmark script and local config.

Changes

Cohort / File(s) Summary
Relay API Core
rust/main/agents/relayer/src/relay_api/mod.rs, rust/main/agents/relayer/src/relay_api/extractor.rs, rust/main/agents/relayer/src/relay_api/handlers.rs, rust/main/agents/relayer/src/relay_api/metrics.rs
New relay_api module: extractor exposes ExtractedMessage and extract_messages; Axum POST /relay handler with RateLimiter and TxHashCache, extraction, ISM classification, enqueueing, DB persistence, HTTP error mapping; Prometheus metrics via RelayApiMetrics.
Server & Relayer Integration
rust/main/agents/relayer/src/lib.rs, rust/main/agents/relayer/src/server/mod.rs, rust/main/agents/relayer/src/relayer.rs
Expose relay_api; Server/Relayer gain builder fields/methods for relay send channels, indexers, tx hash cache, metrics, rate limiter; wiring into router; env toggles to disable indexing or relay API; new Relayer settings fields for rate-limit config.
Indexer Trait & Default Parse
rust/main/hyperlane-core/src/traits/indexer.rs
Adds default Indexer::parse_tx_hash (hex parsing, validations, left-pad to H512) with unit tests covering hex/no-0x, short/long, invalid, empty inputs.
Chain-specific parsers & tests
rust/main/chains/hyperlane-aleo/..., rust/main/chains/hyperlane-radix/..., rust/main/chains/hyperlane-sealevel/src/mailbox_indexer.rs
Add chain-specific parse_tx_hash implementations and helpers: Aleo (bech32 → H512), Radix (bech32m → H512), Sealevel (base58 → H512); validations, padding semantics, and unit tests.
Cargo / Workspace updates
rust/main/agents/relayer/Cargo.toml, rust/main/chains/hyperlane-aleo/Cargo.toml
Add dependencies: tower-http (cors); new chain crates as path deps (hyperlane-aleo optional, hyperlane-cosmos/fuel/radix/starknet/tron); enable hex.workspace and bech32.workspace.
DB & Core tweaks
rust/main/hyperlane-base/src/db/rocks/hyperlane_db.rs, rust/main/hyperlane-core/src/chain.rs
On duplicate-message early-return, RocksDB now updates max-seen message nonce before returning; change discriminant of KnownHyperlaneDomain::Test4 (31337 → 9913374).
Config, scripts & tooling
rust/main/config/local_config.json, scripts/measure-relay-time.sh
Add local_config.json with two local Ethereum chains; new measure-relay-time.sh to benchmark relay times and optionally call the Relay API, collecting stats.
Misc & Tests
rust/main/chains/hyperlane-aleo/src/utils/tests.rs, rust/main/chains/hyperlane-radix/src/utils.rs, .spellcheck/typos.toml, rust/main/agents/relayer/src/relayer/tests.rs
Add parser unit tests for Aleo/Radix/Sealevel; update spellcheck excludes; tests updated to include new RelayerSettings rate-limit fields.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Handler as RelayHandler
    participant RL as RateLimiter
    participant Cache as TxHashCache
    participant Extractor as Extractor
    participant Indexer as ChainIndexer
    participant Classifier as ISMClassifier
    participant DB as OriginDB
    participant Queue as SendQueue

    Client->>Handler: POST /relay (origin_chain, tx_hash)
    Handler->>RL: check()
    RL-->>Handler: allowed/denied
    Handler->>Cache: check_and_insert(chain, tx_hash)
    Cache-->>Handler: ok/duplicate
    Handler->>Extractor: extract_messages(indexers, chain, tx_hash)
    Extractor->>Indexer: parse_tx_hash(tx_hash)
    Indexer-->>Extractor: H512
    Extractor->>Indexer: fetch_logs(parsed_hash)
    Indexer-->>Extractor: [HyperlaneMessage]
    Extractor-->>Handler: ExtractedMessage[]
    Handler->>Classifier: classify(recipient)
    Classifier-->>Handler: MessageContext
    Handler->>DB: persist message + mappings
    DB-->>Handler: ok
    Handler->>Queue: enqueue PendingMessage (dest_domain)
    Queue-->>Handler: queued
    Handler-->>Client: 200 OK (messages info)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

A wee API creeps out from marsh and mire,
Parsin' bech32, base58, and hex like fire.
It checks the cache, hums metrics soft and slow,
Pushes messages onward where the queues go.
Calm, steady, useful — go on, give it a go.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 59.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: relayer api' accurately summarizes the main addition - a new relayer API with protocol-specific transaction hash parsing for push-based message relaying.
Description check ✅ Passed The description comprehensively covers all required sections: clear summary, motivation, implementation details, testing verification, and design decisions. It exceeds template requirements with protocol matrix and usage examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.49%. Comparing base (bc401f7) to head (a048ffb).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8364   +/-   ##
=======================================
  Coverage   76.49%   76.49%           
=======================================
  Files         128      128           
  Lines        3416     3416           
  Branches      290      290           
=======================================
  Hits         2613     2613           
  Misses        786      786           
  Partials       17       17           
Components Coverage Δ
core 87.80% <ø> (ø)
hooks 74.55% <ø> (ø)
isms 81.46% <ø> (ø)
token 87.33% <ø> (ø)
middlewares 85.47% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai[bot]

This comment was marked as resolved.

@hyperlane-xyz hyperlane-xyz deleted a comment from claude Bot Mar 18, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
rust/main/agents/relayer/src/relay_api/handlers.rs (1)

208-240: Rate limiter works, but there's room in me swamp for a wee improvement.

The sliding window approach is functionally correct. Using Vec with retain() is O(n) on each check. For higher throughput scenarios, VecDeque with front-popping would be more efficient, though this is likely fine for expected API load.

♻️ Optional: Use VecDeque for O(1) cleanup
+use std::collections::VecDeque;
+
 pub struct RateLimiter {
-    requests: Vec<u64>,
+    requests: VecDeque<u64>,
     max_requests: usize,
     window_secs: u64,
 }

 impl RateLimiter {
     pub fn new(max_requests: usize, window_secs: u64) -> Self {
         Self {
-            requests: Vec::new(),
+            requests: VecDeque::new(),
             max_requests,
             window_secs,
         }
     }

     pub fn check(&mut self) -> bool {
         let now = SystemTime::now()
             .duration_since(UNIX_EPOCH)
             .unwrap()
             .as_secs();

-        // Remove requests outside the window
-        self.requests.retain(|&t| now - t < self.window_secs);
+        // Remove requests outside the window (from front)
+        while let Some(&oldest) = self.requests.front() {
+            if now - oldest >= self.window_secs {
+                self.requests.pop_front();
+            } else {
+                break;
+            }
+        }

         if self.requests.len() >= self.max_requests {
             return false;
         }

-        self.requests.push(now);
+        self.requests.push_back(now);
         true
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 208 - 240,
Replace the Vec-based timestamp storage in struct RateLimiter with a
VecDeque<u64> to avoid O(n) retain() costs; update RateLimiter::new to
initialize requests as a VecDeque, and change RateLimiter::check to remove
expired timestamps by popping from the front in a loop (or while-let) until the
front timestamp is within window_secs, then check length against max_requests
and push_back(now) when accepting a request; ensure you update imports (use
std::collections::VecDeque) and adjust references to requests.push and
requests.retain to use push_back and front/pop_front semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@rust/main/agents/relayer/src/relay_api/handlers.rs`:
- Around line 208-240: Replace the Vec-based timestamp storage in struct
RateLimiter with a VecDeque<u64> to avoid O(n) retain() costs; update
RateLimiter::new to initialize requests as a VecDeque, and change
RateLimiter::check to remove expired timestamps by popping from the front in a
loop (or while-let) until the front timestamp is within window_secs, then check
length against max_requests and push_back(now) when accepting a request; ensure
you update imports (use std::collections::VecDeque) and adjust references to
requests.push and requests.retain to use push_back and front/pop_front
semantics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3087b0f-fc3b-4ecf-b066-2258e88baaea

📥 Commits

Reviewing files that changed from the base of the PR and between d839c54 and a048ffb.

📒 Files selected for processing (1)
  • rust/main/agents/relayer/src/relay_api/handlers.rs

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
rust/main/agents/relayer/src/relay_api/handlers.rs (1)

329-334: ⚠️ Potential issue | 🟠 Major

All extraction failures become 400 Bad Request - needs differentiation.

Look, not every failed extraction is the client's fault. Right now, if the indexer RPC is down or there's a transient network blip, this returns a 400 like the client did something wrong. That's not fair to them, is it?

Infrastructure failures should return 5xx (e.g., ServiceUnavailable), invalid formats should be 400, and "tx not found" could arguably be 404. Collapsing everything into one error type muddles retry semantics for callers.

🐛 Suggested approach: distinguish error types
     .map_err(|e| {
         if let Some(ref metrics) = state.metrics {
             metrics.inc_failure("extraction_failed");
         }
-        ServerError::InvalidRequest(format!("Failed to extract messages: {e}"))
+        // Distinguish between client errors and server errors
+        let err_str = e.to_string();
+        if err_str.contains("not found") || err_str.contains("no messages") {
+            ServerError::NotFound
+        } else if err_str.contains("invalid") || err_str.contains("parse") {
+            ServerError::InvalidRequest(format!("Invalid request: {e}"))
+        } else {
+            ServerError::ServiceUnavailable(format!("Extraction service error: {e}"))
+        }
     })?;

Alternatively, have extract_messages return a typed error enum that maps cleanly to HTTP status codes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 329 - 334,
The current map_err in the call that uses extract_messages collapses every
failure into ServerError::InvalidRequest (400); change this to distinguish error
kinds by updating extract_messages (or the call site) to return/propagate a
typed error enum (e.g., ExtractError { InvalidInput, NotFound,
RemoteUnavailable, Transient }) or match on existing error variants from the
indexer client; then map InvalidInput -> ServerError::InvalidRequest, NotFound
-> ServerError::NotFound (404), and RemoteUnavailable/Transient ->
ServerError::ServiceUnavailable (5xx) while still incrementing metrics via
state.metrics.inc_failure with appropriate labels (e.g., "extraction_invalid",
"extraction_not_found", "extraction_unavailable") so callers see correct HTTP
semantics and retry signals.
🧹 Nitpick comments (2)
rust/main/agents/relayer/src/relay_api/handlers.rs (2)

449-457: Error message could be clearer for unsupported destinations.

The error "No send channel for destination domain" sounds like an internal plumbing problem rather than "this relayer doesn't support that route." A user-friendly message would help callers understand what went wrong without them having to dig through swamp logs.

✨ Suggested clearer error message
         let send_channel = send_channels
             .get(&extracted.destination_domain)
             .ok_or_else(|| {
                 ServerError::InvalidRequest(format!(
-                    "No send channel for destination domain {}",
+                    "Destination domain {} is not supported by this relayer",
                     extracted.destination_domain
                 ))
             })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 449 - 457,
The current ServerError produced when
send_channels.get(&extracted.destination_domain) is missing uses a message that
sounds like an internal plumbing error; change the message in the
ServerError::InvalidRequest to a user-friendly one indicating the destination is
unsupported (e.g., "Unsupported destination domain" or "Relayer does not support
destination domain") and include the extracted.destination_domain value so
callers clearly know the route is not supported; update the error construction
in the lookup around send_channels, preserving ServerError::InvalidRequest.

281-289: Consider validating tx_hash for empty string too.

You're checking origin_chain isn't empty, but an empty tx_hash would slip through and fail later in extraction with a less helpful error message. Might as well catch it early.

✨ Suggested validation addition
     // Validate request
     if req.origin_chain.is_empty() {
         if let Some(ref metrics) = state.metrics {
             metrics.inc_failure("invalid_request");
         }
         return Err(ServerError::InvalidRequest(
             "origin_chain cannot be empty".to_string(),
         ));
     }
+
+    if req.tx_hash.is_empty() {
+        if let Some(ref metrics) = state.metrics {
+            metrics.inc_failure("invalid_request");
+        }
+        return Err(ServerError::InvalidRequest(
+            "tx_hash cannot be empty".to_string(),
+        ));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/main/agents/relayer/src/relay_api/handlers.rs` around lines 281 - 289,
Add the same early validation for req.tx_hash as exists for req.origin_chain:
check if req.tx_hash.is_empty(), increment
state.metrics.inc_failure("invalid_request") when metrics is Some, and return
Err(ServerError::InvalidRequest("tx_hash cannot be empty".to_string())); place
this alongside the existing origin_chain check in the handler so empty tx_hash
is rejected with a clear error before extraction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@rust/main/agents/relayer/src/relay_api/handlers.rs`:
- Around line 329-334: The current map_err in the call that uses
extract_messages collapses every failure into ServerError::InvalidRequest (400);
change this to distinguish error kinds by updating extract_messages (or the call
site) to return/propagate a typed error enum (e.g., ExtractError { InvalidInput,
NotFound, RemoteUnavailable, Transient }) or match on existing error variants
from the indexer client; then map InvalidInput -> ServerError::InvalidRequest,
NotFound -> ServerError::NotFound (404), and RemoteUnavailable/Transient ->
ServerError::ServiceUnavailable (5xx) while still incrementing metrics via
state.metrics.inc_failure with appropriate labels (e.g., "extraction_invalid",
"extraction_not_found", "extraction_unavailable") so callers see correct HTTP
semantics and retry signals.

---

Nitpick comments:
In `@rust/main/agents/relayer/src/relay_api/handlers.rs`:
- Around line 449-457: The current ServerError produced when
send_channels.get(&extracted.destination_domain) is missing uses a message that
sounds like an internal plumbing error; change the message in the
ServerError::InvalidRequest to a user-friendly one indicating the destination is
unsupported (e.g., "Unsupported destination domain" or "Relayer does not support
destination domain") and include the extracted.destination_domain value so
callers clearly know the route is not supported; update the error construction
in the lookup around send_channels, preserving ServerError::InvalidRequest.
- Around line 281-289: Add the same early validation for req.tx_hash as exists
for req.origin_chain: check if req.tx_hash.is_empty(), increment
state.metrics.inc_failure("invalid_request") when metrics is Some, and return
Err(ServerError::InvalidRequest("tx_hash cannot be empty".to_string())); place
this alongside the existing origin_chain check in the handler so empty tx_hash
is rejected with a clear error before extraction.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6a035b9f-cbc1-475b-82be-fe4e0ea514c5

📥 Commits

Reviewing files that changed from the base of the PR and between bfb78a8 and 5535f77.

📒 Files selected for processing (2)
  • rust/main/agents/relayer/src/relay_api/handlers.rs
  • rust/main/agents/relayer/src/relayer/tests.rs

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator

@paulbalaji paulbalaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review on d68b2b6

Prior Devin/CodeRabbit threads are all resolved — nice progress. Three findings remain before approval (one blocker), plus two broader notes.

Good catches to credit:

  • Two-pass validate-then-commit refactor in the handler
  • Explicit EVM-only scope narrowing with the clear comment at rust/main/hyperlane-core/src/traits/indexer.rs:78-89
  • HTTP 404 transient detection replacing the earlier substring matching in ccip_read
  • fail_fast mode opt-in, so normal relayer behavior is unchanged

Non-inline notes

Medium — POST /relay has no authentication, and the rate limiter is a single process-global bucket (100/60s). The CCTP-V2 gate narrows the blast radius but each accepted request still drives RPC work on the destination mailbox and Circle's attestation service. Consider an API-key header, per-IP limits, or binding to a private interface in production. Worth documenting the intended deployment model.

Medium — the relay_api module has no unit tests. 655 LoC of handler logic: validate-then-commit, TxHashCache behavior under full/duplicate/ok, rate-limit window semantics, CCTP-V2 gate rejection, fail-fast retry budget, partial-send handling, and the CORS parser. All untested. Given the security-facing nature of this endpoint, coverage would be worth adding before the first deploy.


Net: not blocking on style. Finding #1 (partial-commit + dedup lockout) is the one to resolve before this lands.

Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs Outdated
Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs
Comment thread rust/main/agents/relayer/src/msg/pending_message.rs Outdated
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator

@paulbalaji paulbalaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated re-review (commit dddf9aea07)

Status: requesting changes. Two high-severity items + three mediums below. This is a follow-up re-review that accounts for prior devin/coderabbit/author exchanges — items already resolved (Box::leak, RwLock poisoning, DB-before-send, ISM-outside-timeout, rate-limit zero validation, empty tx hash, dedup-before-extraction, now−t clock-skew underflow, partial-send duplication, multi-message extraction) are not re-raised, and the devin parse_tx_hash H512 padding concern is independently verified to be a false positive (hyperlane-core's impl_fixed_hash_conversions!(H512, H256) in hyperlane-core/src/types/primitive_types.rs reads bottom-aligned bytes [32..64], which matches the left-pad; the conversion round-trips correctly for EVM hashes). Kudos to the author for catching that.

High severity — blockers

  1. fetch_raw_logs_and_meta signature flip (ErrOk(None) on missing receipt) regressed retry semantics for all EVM tx-id indexing, not just /relay. See inline on contracts/utils.rs.
  2. TxHashCache is a read-then-write TOCTOU: concurrent identical requests all pass the early contains() read-lock, amplify phase-1 RPCs, and enqueue duplicate PendingMessages before the write-lock commit. See inline on relay_api/handlers.rs:610.

Medium

  1. CCTP V2 gate is per-tx, not per-message — a tx mixing DepositForBurn(V2) with an unrelated Dispatch fast-paths the unrelated one. Inline on relay_api/extractor.rs.
  2. TS SDK parity gap: three new relayApi* settings keys are absent from typescript/sdk/src/metadata/agentConfig.ts and typescript/infra/src/config/agent/relayer.ts, despite the parity contract at the top of settings/mod.rs. Inline on settings/mod.rs.
  3. No handler-level tests for /relay. The coverage in relayer/tests.rs only exercises config plumbing — dedupe race, rate limit, CCTP gating, cache-full-after-send, concurrent same-tx, and partial multi-destination send are uncovered. Inline on relay_api/handlers.rs:327.

PR body is stale

The description still advertises Solana/Radix/Aleo push-relay support, but the diff ships no chain-specific parse_*_tx_hash functions and /relay is gated to EVM CCTP V2 via is_cctp_v2 in extractor.rs. Please update the description so reviewers and deployers aren't misled.

Observations outside this PR

  • rust/main/hyperlane-base/src/contract_sync/mod.rs:216-220 is the caller that silently drops tx_ids on the new Ok(None) path above — this is the blast radius beyond /relay for High #1.
  • Rate limiter is a single global RwLock<RateLimiter> with no per-IP / per-key dimension; combined with the TOCTOU above, an attacker can amplify per-request RPC cost ~22× (2 + 10×recipient_ism + 10×get_app_context) within the global quota. Not a blocker on its own; consider tightening defaults and parallelizing the per-message RPCs with join_all.

Comment thread rust/main/chains/hyperlane-ethereum/src/contracts/utils.rs
Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs Outdated
Comment thread rust/main/agents/relayer/src/relay_api/extractor.rs
Comment thread rust/main/agents/relayer/src/settings/mod.rs
Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator

@paulbalaji paulbalaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta re-review — HEAD 7e9d7dad88

Follow-up to #8364 (review). I re-read every threaded author reply since that review; nothing currently ends on an unaddressed author response except the two Mediums carried forward below. Non-blocking comment review — a few items worth a second look before merge.

Resolved since the last pass ✅

  • High 1 (fetch_raw_logs_and_meta retry regression): fixed. contracts/utils.rs still returns Ok(None), but mailbox.rs, interchain_gas.rs, and merkle_tree_hook.rs now convert NoneErr inside call_and_retry_indefinitely, restoring pre-PR retry semantics for the shared contract_sync path.
  • High 2 (TxHashCache TOCTOU): fixed. handlers.rs:395-412 reserves the dedup slot under a single write lock before any RPC / send / DB work, with remove() rollback on the error path. test_concurrent_same_tx_only_one_succeeds exercises the race.
  • Medium 5 (no /relay tests): largely addressed — 11 tests in relay_api/tests.rs cover happy path, dedup within/after TTL, concurrency, rate limit, timeout, CCTP gate (happy), whitelist/blacklist, cache-full-before-send, partial-send release.
  • Acknowledged author replies: the H512 padding dispute was a false positive, max_retries = 3 is documented, the "does not apply here" reply on utils.rs:30 is now accurate because the retry wrapper is in the callers.
  • 7e9d7dad88 cleanly reuses the already-built message indexer instead of standing up a second one (relayer/origin.rs, relayer.rs:644-648).

Still open — please address

  1. Test vs. code disagree on extraction-timeout status — new test expects 408, current code returns 400. See inline on extractor.rs:45 and tests.rs:357.
  2. CCTP V2 gate is per-tx, not per-message — acknowledged the author's Iris-attestation reasoning, but it addresses spoofed burns, not the mixed-message case. See inline on extractor.rs:96.
  3. TS SDK parity regressed — four Rust-side keys now lack a TS mirror despite the parity contract. See inline on settings/mod.rs:77.
  4. Reservation leak on handler cancellation — minor / defense-in-depth: rollback only runs if relay_work returns. See inline on handlers.rs:415.

PR description is still stale

The summary still claims push-relay support for Solana, Radix, and Aleo, but the current diff ships no chain-specific parse_*_tx_hash and /relay is CCTP-V2-gated to EVM via extract_messages. Please update so downstream consumers aren't misled.

Observations outside this PR (not blocking)

  • hyperlane-base/src/server/base_server.rs:27-49 uses plain axum::serve with no tower::timeout / GracefulShutdown layer today, which is partly why the cancellation leak above is not an immediate exploit. If that wrapper is ever added, the reservation-leak becomes real.
  • rate_limiter.check() at handlers.rs consumes a token before the dedup reservation, so duplicate requests within TTL still burn rate-limit quota. Minor; not a regression.

Comment thread rust/main/agents/relayer/src/relay_api/extractor.rs Outdated
Comment thread rust/main/agents/relayer/src/relay_api/extractor.rs
Comment thread rust/main/agents/relayer/src/settings/mod.rs
Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs
@paulbalaji
Copy link
Copy Markdown
Collaborator

CI failure — same issue flagged inline in my last review

The failing job on run 24719806500 is:

thread 'relay_api::tests::test_extraction_timeout' panicked at agents/relayer/src/relay_api/tests.rs:357:5:
assertion `left == right` failed
  left: 400
 right: 408

This is the exact mismatch flagged inline on extractor.rs:45 in review #4147343345: the 5 s inner timeout is wrapped in eyre!(...) at extractor.rs:33-48, the handler in handlers.rs:439-443 maps any eyre::Error from extract_messages to ServerError::InvalidRequest → HTTP 400. The test expects StatusCode::REQUEST_TIMEOUT (408), which is the correct semantic.

Minimal fix

Surface the inner-timeout case distinctly so the handler can map it to RequestTimeout. Smallest diff:

1. rust/main/agents/relayer/src/relay_api/extractor.rs — introduce a typed error and return it on the inner timeout:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum ExtractError {
    #[error("timed out fetching transaction receipt")]
    Timeout,
    #[error(transparent)]
    Other(#[from] eyre::Error),
}

pub async fn extract_messages(
    indexers: &HashMap<String, Arc<dyn Indexer<HyperlaneMessage>>>,
    chain_name: &str,
    tx_hash: &str,
) -> Result<Vec<ExtractedMessage>, ExtractError> {
    // ...
    let messages_with_meta = tokio::time::timeout(
        Duration::from_secs(5),
        indexer.fetch_logs_by_tx_hash(tx_hash_512),
    )
    .await
    .map_err(|_| {
        error!(chain = %chain_name, tx_hash = %tx_hash, "Timed out waiting for transaction receipt");
        ExtractError::Timeout
    })?
    .map_err(|e| ExtractError::Other(eyre!("Failed to fetch transaction logs: {}", e)))?;
    // ...
}

Also re-export the new error from relay_api/mod.rs.

2. rust/main/agents/relayer/src/relay_api/handlers.rs:436-443 — branch on the error variant:

.map_err(|e| match e {
    ExtractError::Timeout => {
        state.record_failure("timeout");
        ServerError::RequestTimeout
    }
    ExtractError::Other(err) => {
        state.record_failure("extraction_failed");
        ServerError::InvalidRequest(format!("Failed to extract messages: {err}"))
    }
})?;

The outer tokio::time::timeout(Duration::from_secs(10), ...) at handlers.rs:431-435 already maps to ServerError::RequestTimeout, so both timeout paths now produce the same 408, and only genuine input/RPC errors remain as 400.

Alternative (one-line fix)

If you'd rather not add an error type, drop the 5 s inner timeout entirely and rely on the 10 s outer timeout. The tradeoff is exactly what the comment at extractor.rs:28-32 already mentions — an invalid or not-yet-confirmed tx hash will burn the full 10 s budget before returning. The typed-error approach preserves the fast-fail and also unblocks CI.

Not re-posting the four items from the consolidated review above — they still stand as-is after this fix.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs
Comment on lines +369 to +371
if status == reqwest::StatusCode::NOT_FOUND {
return Err(MetadataBuildError::AttestationPending);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 HTTP 404 universally treated as AttestationPending for all CCIP read URLs

At rust/main/agents/relayer/src/msg/metadata/ccip_read/mod.rs:369-371, any HTTP 404 response from any CCIP read URL triggers AttestationPending, causing up to 30 retries at 1-second intervals. The CCIP URLs come from on-chain ISM contracts and could point to any server. While the comment explains this is designed for Circle's attestation service, a misconfigured or non-Circle URL that legitimately returns 404 (e.g., wrong path) will waste ~30 seconds per URL before moving to the next. This is bounded and non-critical, but could be improved by making the 404→pending mapping configurable or checking a response header/body marker specific to Circle's API.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread rust/main/chains/hyperlane-ethereum/src/contracts/mailbox.rs
Comment thread rust/main/agents/relayer/src/relay_api/handlers.rs
@hyper-gonk
Copy link
Copy Markdown
Contributor

hyper-gonk Bot commented Apr 22, 2026

🦀 Rust Agent Docker Image Built Successfully

Service Tag
agent 5386d58-20260422-095016
Full image paths
ghcr.io/hyperlane-xyz/hyperlane-agent:5386d58-20260422-095016

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 11 additional findings in Devin Review.

Open in Devin Review

Comment on lines +96 to +104
let is_cctp_v2 = indexer.is_cctp_v2(tx_hash_512).await.unwrap_or_else(|e| {
warn!(
chain = %chain_name,
tx_hash = %tx_hash,
error = ?e,
"Failed to check for CCTP V2 burn event, treating as non-CCTP"
);
false
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Redundant receipt RPC call in is_cctp_v2 silently rejects valid CCTP V2 messages on transient failures

The extract_messages function in extractor.rs:47-68 first calls indexer.fetch_logs_by_tx_hash(tx_hash_512) which internally calls get_transaction_receipt (via call_and_retry_indefinitely). After successfully extracting messages, it then calls indexer.is_cctp_v2(tx_hash_512) at line 96, which makes a second, independent get_transaction_receipt RPC call (rust/main/chains/hyperlane-ethereum/src/contracts/mailbox.rs:213-217). If this second call fails due to a transient RPC error, rate-limiting, or load-balancer routing to a different node, the unwrap_or_else on line 96 silently converts the error to false. The relay API then rejects the valid CCTP V2 message with a misleading "Only EVM CCTP V2 messages are supported" error. This is both a correctness issue (valid messages rejected) and a performance issue (double RPC call for the same receipt).

Prompt for agents
The extract_messages function in extractor.rs makes two separate get_transaction_receipt RPC calls for the same tx hash: once inside fetch_logs_by_tx_hash (line 49) and again inside is_cctp_v2 (line 96). When the second call fails transiently, the unwrap_or_else converts the error to false, causing valid CCTP V2 messages to be rejected with a misleading error.

The fix should combine the receipt fetch into a single RPC call. Options:
1. Add a new method to the Indexer trait (e.g., fetch_logs_and_check_cctp_v2) that returns both the logs and the CCTP V2 status from a single receipt.
2. Have fetch_logs_by_tx_hash return the raw receipt alongside the parsed logs, so is_cctp_v2 can inspect the already-fetched receipt.
3. Add a variant of is_cctp_v2 that accepts a receipt directly instead of a tx hash.

Alternatively, if keeping two calls, the is_cctp_v2 error should be propagated as ExtractError::Failed rather than silently treated as false. A transient RPC failure should not be treated as a definitive non-CCTP-V2 determination.

Relevant files: rust/main/agents/relayer/src/relay_api/extractor.rs (lines 47-104), rust/main/chains/hyperlane-ethereum/src/contracts/mailbox.rs (is_cctp_v2 method at line 202), rust/main/hyperlane-core/src/traits/indexer.rs (is_cctp_v2 trait default at line 88).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +277 to +280
.route("/relay", post(create_relay))
.layer(cors)
.with_state(self)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 No authentication on relay API endpoint — mitigated by rate limiter, dedup cache, and CCTP V2 gate

The /relay endpoint at handlers.rs:277 has no authentication mechanism. Any client that can reach the server can submit relay requests. The risk is mitigated by: (1) the rate limiter (default 100 req/60s), (2) the dedup TxHashCache (prevents replay within TTL), (3) the CCTP V2 check (only genuine Circle DepositForBurn transactions are accepted), and (4) whitelist/blacklist filtering. Since Circle's attestation service is the ultimate gate (without a genuine attestation, the message can't be delivered), an attacker can cause wasted RPC calls but not actual fund theft. Still, the lack of authentication should be documented and potentially addressed if the relay API is exposed to the public internet.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Collaborator

@paulbalaji paulbalaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta re-review — HEAD 5386d585df

Follow-up to #8364 (review). Re-read all 32 troykessler threaded replies (32 total in-tree). Non-blocking comment review, but two items still need attention before merge — plus a CI fix.

Resolved since last pass ✅

  • test_extraction_timeout 408 vs 400: fixed via typed ExtractError::{Timeout,Failed} enum in extractor.rs; handler in handlers.rs:492-501 maps Timeout → ServerError::RequestTimeout (408). Author confirmed pass now at 2026-04-21T11:56Z.
  • Reservation leak on handler cancellation: fixed via TxHashReservation RAII guard at handlers.rs:112-155, with switch from tokio::sync::RwLock<TxHashCache>parking_lot::Mutex<TxHashCache>. Drop path releases on cancel; commit() consumes self on success. Clean.
  • TS SDK parity: author replied 2026-04-21T11:57Z on settings/mod.rs:77 with we dont plan on implementing the relayer api in the ts relayer. Accepted — downgrading to nit below (just update the stale parity-contract comment).
  • Prior Highs (fetch_raw_logs_and_meta retry regression, TxHashCache TOCTOU): still resolved on this head.

Still open — please address

  1. 🔴 CI lint — 3 clippy errors in extractor.rs block test-rs-run. Inline below; trivial fix.
  2. 🔴 Double-RPC in is_cctp_v2 silently rejects valid CCTP V2 messages on transient RPC blip (flagged by @devin-ai-integration at 2026-04-22T10:01Z, independently confirmed). Author's older we dont care reply at 2026-04-20T14:11Z on extractor.rs:96 predates this double-RPC finding and the current misleading-400 path. Inline below.
  3. 🟠 Second eviction pass added in 5386d585df is a no-op. Inline below.
  4. Nit: settings/mod.rs:5 still says "ANY CHANGES HERE NEED TO BE REFLECTED IN THE TYPESCRIPT SDK" — given the author's "not implementing in TS relayer" stance, the comment is now stale. Inline below.

CI (lint failure)

Run 24771827307 fails with 3 clippy uninlined_format_args errors, all in rust/main/agents/relayer/src/relay_api/extractor.rs. Fix recipe in the inline. No other clippy/rustfmt regressions observed.

Overall

Not blocking structurally — most work is done well — but the double-RPC silent-false and the CI lint failure should close before merge. Not re-raising the withdrawn per-tx CCTP gate thread; author has explicitly skipped it.

) -> Result<Vec<ExtractedMessage>, ExtractError> {
// Get indexer for chain
let indexer = indexers.get(chain_name).ok_or_else(|| {
ExtractError::Failed(format!("Chain not found in registry: {}", chain_name))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CI blocker] Three clippy uninlined_format_args errors in this file are blocking test-rs-run on CI run 24771827307:

error: variables can be used directly in the `format!` string
   --> agents/relayer/src/relay_api/extractor.rs:28:30
   --> agents/relayer/src/relay_api/extractor.rs:40:43
   --> agents/relayer/src/relay_api/extractor.rs:67:30

Fix (three 1-line edits in this file):

-ExtractError::Failed(format!("Chain not found in registry: {}", chain_name))
+ExtractError::Failed(format!("Chain not found in registry: {chain_name}"))

-.map_err(|e| ExtractError::Failed(format!("Invalid tx hash format: {}", e)))?;
+.map_err(|e| ExtractError::Failed(format!("Invalid tx hash format: {e}")))?;

-ExtractError::Failed(format!("Failed to fetch transaction logs: {}", e))
+ExtractError::Failed(format!("Failed to fetch transaction logs: {e}"))

cargo clippy -p relayer --all-targets --features aleo,integration_test -- -D warnings should then pass locally. (cargo fix --lib -p relayer --tests applies all three automatically.)


// Check once per tx whether this is a CCTP fast transfer.
// Errors are treated as false — the relay API will reject the request below.
let is_cctp_v2 = indexer.is_cctp_v2(tx_hash_512).await.unwrap_or_else(|e| {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[High] The is_cctp_v2 call here performs a second, unretried get_transaction_receipt RPC against origin chain. Trace:

  • L49 indexer.fetch_logs_by_tx_hash(tx_hash_512)mailbox.rs:176-189call_and_retry_indefinitely around fetch_raw_logs_and_meta → one get_transaction_receipt. Retried on failure.
  • L96 indexer.is_cctp_v2(tx_hash_512)mailbox.rs:213-217direct self.provider.get_transaction_receipt(...). No retry wrapper, no timeout.
  • On transient failure, unwrap_or_else(|_| false) here (L96-104) silently converts the error to falsehandlers.rs:555 returns ServerError::InvalidRequest("Only EVM CCTP V2 messages are supported via the relay API")HTTP 400.

The 400 is actively misleading: it tells a client the tx is ineligible when it actually is CCTP V2 — the second receipt fetch just flaked (common with per-node LB cache inconsistency on busy chains). Clients see "not CCTP V2" and don't retry.

Your older 2026-04-20 14:11Z reply on this thread (we dont care since this will be picked up a few seconds later by the relayer contract indexer) predates this double-RPC finding — the indexer recovery path doesn't fix the misleading response body or the dedup-cache reservation that the client doesn't know to retry past.

Fix options, best to minimum:

  1. Best: extend the indexer trait so fetch_logs_by_tx_hash returns logs + CCTP flag from a single receipt fetch. Saves the second RPC on every request, not just error paths.
  2. Minimum: propagate is_cctp_v2 error as ExtractError::Failed (or a new ExtractError::Transient → 503) instead of .unwrap_or_else(|_| false). Eliminates the misleading 400.
  3. Wrap the inner get_transaction_receipt in mailbox.rs::is_cctp_v2 with call_and_retry_indefinitely, matching the other call site.

}
}

// Enforce max size — run eviction again if still at capacity
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] The second eviction pass added in 5386d585df is a no-op.

  • First pass (L86-99) runs when self.cache.len() > self.max_entries * 3 / 4
  • Second pass (L100-105) runs when self.cache.len() >= self.max_entries

For any max_entries >= 1, whenever the second condition is true the first already ran, and both use the same now, same ttl, same retain predicateretain is idempotent, so the second pass evicts exactly nothing the first didn't.

If the intent was to catch entries that expired between the two checks, now must be refreshed before the second pass:

 // Enforce max size — run eviction again if still at capacity
 if self.cache.len() >= self.max_entries {
+    let now = Instant::now();
     let ttl = self.ttl;
     self.cache
         .retain(|_, &mut timestamp| now.duration_since(timestamp) < ttl);
 }

In practice the time delta between the two passes is microseconds, so even with the refresh it evicts ~nothing. Cleanest is to delete the added block; the existing 75%-threshold pass already handles it.

Comment thread rust/main/agents/relayer/src/settings/mod.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants