Skip to content

Conversation

@antoinedc
Copy link
Member

@antoinedc antoinedc commented Jan 9, 2026

User description

Summary

  • Add POST /api/explorers/:slug/syncFailure endpoint protected by secretMiddleware for PM2 server to report RPC failures
  • Modify blockSync and receiptSync jobs to directly call explorer.incrementSyncFailures() when RPC errors occur
  • After 3 consecutive failures, explorer sync is automatically disabled with shouldSync=false and syncDisabledReason='rpc_error'
  • Rate-limited requests are excluded from failure counting (expected behavior, not actual failures)

Test plan

  • Unit tests added for new endpoint (6 test cases)
  • All 132 explorer tests pass
  • Manual testing with docker environment:
    • Endpoint returns 401 for missing/invalid secret
    • Endpoint returns 404 for non-existent explorer
    • Failure counter increments correctly (1/3, 2/3, 3/3)
    • Auto-disables after 3 failures

🤖 Generated with Claude Code


CodeAnt-AI Description

Add real-time RPC failure reporting and automatic explorer auto-disable

What Changed

  • New protected endpoint POST /api/explorers/:slug/syncFailure to report RPC sync failures and return whether the explorer was auto-disabled and the current failure count
  • blockSync and receiptSync now report non-rate-limited RPC errors to the explorer failure counter; if the counter reaches the disable threshold the sync is stopped immediately
  • Endpoint defaults reason to "rpc_error" when missing and logs attempts; unit tests added to cover auth, missing explorer, counting, default reason, and auto-disable behavior

Impact

✅ Automatic explorer sync disable after repeated RPC failures
✅ Faster detection of RPC outages for affected explorers
✅ Clearer failure reporting for sync processes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

When blockSync/receiptSync jobs encounter RPC errors, they now immediately
report failures to the explorer model. After 3 consecutive failures, the
explorer's sync is automatically disabled.

Changes:
- Add POST /api/explorers/:slug/syncFailure endpoint for PM2 server
- Modify blockSync job to report RPC failures directly via model
- Modify receiptSync job to report RPC failures directly via model
- Add tests for new endpoint (6 test cases)

Rate-limited requests are excluded from failure counting since they are
expected behavior, not actual RPC failures.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@codeant-ai
Copy link

codeant-ai bot commented Jan 9, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Jan 9, 2026
@codeant-ai
Copy link

codeant-ai bot commented Jan 9, 2026

Sequence Diagram

Shows how RPC failures are reported either by the PM2 API endpoint or directly from sync jobs, how the explorer failure counter is incremented, and how explorers are auto-disabled after repeated failures (3 attempts).

sequenceDiagram
    participant PM2
    participant API
    participant SyncJob
    participant ExplorerModel

    PM2->>API: POST /api/explorers/:slug/syncFailure (secret)
    API->>ExplorerModel: incrementSyncFailures(reason)
    ExplorerModel-->>API: { attempts, disabled }
    API-->>PM2: 200 OK (attempts, disabled, message)

    SyncJob->>ExplorerModel: incrementSyncFailures('rpc_error') on RPC error
    ExplorerModel-->>SyncJob: { attempts, disabled }
    alt disabled == true
        SyncJob-->>SyncJob: stop sync (shouldSync=false / return)
    end
Loading

Generated by CodeAnt AI

@codeant-ai
Copy link

codeant-ai bot commented Jan 9, 2026

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Unvalidated inputs
    reason and source are taken directly from req.body and forwarded to model/logs. Arbitrary strings may be stored/logged. Validate/sanitize these inputs and restrict reason to an allow-list of known failure types to avoid unexpected values or injection via logs/DB.

  • Blocking DB call / potential hang
    The job awaits workspace.explorer.incrementSyncFailures(...) directly. If the DB is slow or the call hangs, the whole job will be delayed or blocked. Consider bounding this call with a timeout or doing it in a non-blocking/fallback manner while still allowing immediate detection of auto-disable.

  • Potential noisy increments / race amplification
    Multiple concurrent failing jobs could rapidly increment the explorer's syncFailedAttempts. Although increment is atomic, bursts of jobs for the same failure window may cause faster-than-intended auto-disable. Consider adding debounce/coalescing or rate-limiting of increments per workspace to avoid spurious auto-disable during transient outages.

  • Fragile error checks
    The new logic checks error messages with exact strings ('Rate limited', 'Timed out after'). Error messages can vary across provider libraries or be wrapped; this check is brittle and may cause real RPC failures to be skipped or rate-limited events to be counted incorrectly. Consider normalizing error classification (status codes, error types, or a helper) before branching.

  • String-based error checks
    The code detects "Rate limited" and timeout conditions by comparing error.message strings. Relying on exact message text is fragile (different providers/clients may vary wording). Prefer checking an error code, a typed error, or normalizing the error to a helper function to avoid misclassification and incorrect failure counting.

@codeant-ai
Copy link

codeant-ai bot commented Jan 9, 2026

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Logic error
The sync-failure response message hardcodes the maximum attempts value instead of reflecting the actual threshold

The response message for non-disabled failures hardcodes the maximum failure
threshold as "3", while the actual threshold is controlled by a constant in the
model; if that threshold is changed (or is not 3), the message becomes misleading,
so the message should not embed a fixed "3" value.

run/api/explorers.js [514-520]

 message: result.disabled
                 ? `Sync auto-disabled after ${result.attempts} failures`
-                : `Failure recorded (attempt ${result.attempts}/3)`
+                : `Failure recorded (attempt ${result.attempts})`
 
Suggestion importance[1-10]: 10

Why it matters? 🤔: The current message embeds a hardcoded "/3" which can become inaccurate if the model's failure threshold changes. That can mislead operators and callers. Removing the fixed "3" makes the response accurate regardless of the configured threshold. This is not a breaking change and fixes a real correctness/clarity problem in API output.

10

@codeant-ai
Copy link

codeant-ai bot commented Jan 9, 2026

CodeAnt AI finished reviewing your PR.

- Extract duplicate failure reporting logic to run/lib/syncHelpers.js
- Exclude timeout errors from failure counting (only actual RPC failures)
- Use SYNC_FAILURE_THRESHOLD constant instead of hardcoded value
- Add comprehensive unit tests for syncHelpers (7 test cases)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@antoinedc antoinedc merged commit 5effb3f into develop Jan 9, 2026
2 checks passed
@antoinedc antoinedc deleted the feature/realtime-rpc-failure-reporting branch January 9, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants