Add real-time RPC failure reporting for explorer auto-deactivation #438

antoinedc · 2026-01-09T19:27:11Z

User description

Summary

Add POST /api/explorers/:slug/syncFailure endpoint protected by secretMiddleware for PM2 server to report RPC failures
Modify blockSync and receiptSync jobs to directly call explorer.incrementSyncFailures() when RPC errors occur
After 3 consecutive failures, explorer sync is automatically disabled with shouldSync=false and syncDisabledReason='rpc_error'
Rate-limited requests are excluded from failure counting (expected behavior, not actual failures)

Test plan

Unit tests added for new endpoint (6 test cases)
All 132 explorer tests pass
Manual testing with docker environment:
- Endpoint returns 401 for missing/invalid secret
- Endpoint returns 404 for non-existent explorer
- Failure counter increments correctly (1/3, 2/3, 3/3)
- Auto-disables after 3 failures

🤖 Generated with Claude Code

CodeAnt-AI Description

Add real-time RPC failure reporting and automatic explorer auto-disable

What Changed

New protected endpoint POST /api/explorers/:slug/syncFailure to report RPC sync failures and return whether the explorer was auto-disabled and the current failure count
blockSync and receiptSync now report non-rate-limited RPC errors to the explorer failure counter; if the counter reaches the disable threshold the sync is stopped immediately
Endpoint defaults reason to "rpc_error" when missing and logs attempts; unit tests added to cover auth, missing explorer, counting, default reason, and auto-disable behavior

Impact

✅ Automatic explorer sync disable after repeated RPC failures
✅ Faster detection of RPC outages for affected explorers
✅ Clearer failure reporting for sync processes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

When blockSync/receiptSync jobs encounter RPC errors, they now immediately report failures to the explorer model. After 3 consecutive failures, the explorer's sync is automatically disabled. Changes: - Add POST /api/explorers/:slug/syncFailure endpoint for PM2 server - Modify blockSync job to report RPC failures directly via model - Modify receiptSync job to report RPC failures directly via model - Add tests for new endpoint (6 test cases) Rate-limited requests are excluded from failure counting since they are expected behavior, not actual RPC failures. Co-Authored-By: Claude Opus 4.5 <[email protected]>

codeant-ai · 2026-01-09T19:27:15Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-01-09T19:28:17Z

Sequence Diagram

Shows how RPC failures are reported either by the PM2 API endpoint or directly from sync jobs, how the explorer failure counter is incremented, and how explorers are auto-disabled after repeated failures (3 attempts).

sequenceDiagram
    participant PM2
    participant API
    participant SyncJob
    participant ExplorerModel

    PM2->>API: POST /api/explorers/:slug/syncFailure (secret)
    API->>ExplorerModel: incrementSyncFailures(reason)
    ExplorerModel-->>API: { attempts, disabled }
    API-->>PM2: 200 OK (attempts, disabled, message)

    SyncJob->>ExplorerModel: incrementSyncFailures('rpc_error') on RPC error
    ExplorerModel-->>SyncJob: { attempts, disabled }
    alt disabled == true
        SyncJob-->>SyncJob: stop sync (shouldSync=false / return)
    end

Generated by CodeAnt AI

codeant-ai · 2026-01-09T19:29:37Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Unvalidated inputs `reason` and `source` are taken directly from `req.body` and forwarded to model/logs. Arbitrary strings may be stored/logged. Validate/sanitize these inputs and restrict `reason` to an allow-list of known failure types to avoid unexpected values or injection via logs/DB. Blocking DB call / potential hang The job `await`s `workspace.explorer.incrementSyncFailures(...)` directly. If the DB is slow or the call hangs, the whole job will be delayed or blocked. Consider bounding this call with a timeout or doing it in a non-blocking/fallback manner while still allowing immediate detection of auto-disable. Potential noisy increments / race amplification Multiple concurrent failing jobs could rapidly increment the explorer's `syncFailedAttempts`. Although `increment` is atomic, bursts of jobs for the same failure window may cause faster-than-intended auto-disable. Consider adding debounce/coalescing or rate-limiting of increments per workspace to avoid spurious auto-disable during transient outages. Fragile error checks The new logic checks error messages with exact strings ('Rate limited', 'Timed out after'). Error messages can vary across provider libraries or be wrapped; this check is brittle and may cause real RPC failures to be skipped or rate-limited events to be counted incorrectly. Consider normalizing error classification (status codes, error types, or a helper) before branching. String-based error checks The code detects "Rate limited" and timeout conditions by comparing `error.message` strings. Relying on exact message text is fragile (different providers/clients may vary wording). Prefer checking an error code, a typed error, or normalizing the error to a helper function to avoid misclassification and incorrect failure counting.

codeant-ai · 2026-01-09T19:30:14Z

PR Code Suggestions ✨

Category	Suggestion	Score
Logic error	The sync-failure response message hardcodes the maximum attempts value instead of reflecting the actual threshold The response message for non-disabled failures hardcodes the maximum failure threshold as "3", while the actual threshold is controlled by a constant in the model; if that threshold is changed (or is not 3), the message becomes misleading, so the message should not embed a fixed "3" value. run/api/explorers.js [514-520] message: result.disabled ? `Sync auto-disabled after ${result.attempts} failures` - : `Failure recorded (attempt ${result.attempts}/3)` + : `Failure recorded (attempt ${result.attempts})` Suggestion importance[1-10]: 10 Why it matters? 🤔: The current message embeds a hardcoded "/3" which can become inaccurate if the model's failure threshold changes. That can mislead operators and callers. Removing the fixed "3" makes the response accurate regardless of the configured threshold. This is not a breaking change and fixes a real correctness/clarity problem in API output.	10

codeant-ai · 2026-01-09T19:30:20Z

CodeAnt AI finished reviewing your PR.

- Extract duplicate failure reporting logic to run/lib/syncHelpers.js - Exclude timeout errors from failure counting (only actual RPC failures) - Use SYNC_FAILURE_THRESHOLD constant instead of hardcoded value - Add comprehensive unit tests for syncHelpers (7 test cases) Co-Authored-By: Claude Opus 4.5 <[email protected]>

codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Jan 9, 2026

antoinedc merged commit 5effb3f into develop Jan 9, 2026
2 checks passed

antoinedc deleted the feature/realtime-rpc-failure-reporting branch January 9, 2026 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add real-time RPC failure reporting for explorer auto-deactivation #438

Add real-time RPC failure reporting for explorer auto-deactivation #438

Uh oh!

antoinedc commented Jan 9, 2026 •

edited by codeant-ai bot

Loading

Uh oh!

codeant-ai bot commented Jan 9, 2026

Uh oh!

codeant-ai bot commented Jan 9, 2026

Uh oh!

codeant-ai bot commented Jan 9, 2026

Uh oh!

codeant-ai bot commented Jan 9, 2026

Uh oh!

codeant-ai bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add real-time RPC failure reporting for explorer auto-deactivation #438

Add real-time RPC failure reporting for explorer auto-deactivation #438

Uh oh!

Conversation

antoinedc commented Jan 9, 2026 • edited by codeant-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Test plan

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai bot commented Jan 9, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai bot commented Jan 9, 2026

Sequence Diagram

Uh oh!

codeant-ai bot commented Jan 9, 2026

Nitpicks 🔍

Uh oh!

codeant-ai bot commented Jan 9, 2026

PR Code Suggestions ✨

Uh oh!

codeant-ai bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antoinedc commented Jan 9, 2026 •

edited by codeant-ai bot

Loading