fix(server): Block bot/crawler requests to prevent OOM crashes#1403
Conversation
⚡ Performance Benchmark
Details
History0ef02a9 refactor(website): Use isbot package on client side for consistency
4ba5f1c fix(server): Address PR review feedback
a1de721 fix(server): Address PR review feedback
d74986a fix(server): Add block count to bot guard throttled logs
0e85849 fix(server): Remove isbot from root deps and drop server tests
0f30226 fix(server): Block bot/crawler requests to prevent OOM crashes
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe changes implement bot detection and blocking functionality across client and server. A new Changes
Sequence DiagramsequenceDiagram
participant Client as Client Request
participant Middleware as botGuardMiddleware
participant IsBot as isbot Library
participant Logger as Request Logger
participant API as API Handler
Client->>Middleware: HTTP Request (with User-Agent)
Middleware->>Middleware: Extract User-Agent via getClientInfo()
Middleware->>IsBot: isbot(userAgent)
alt Bot Detected
IsBot-->>Middleware: true
Middleware->>Logger: logWarning (throttled, 60s interval)
Middleware-->>Client: 403 JSON Error Response
else Legitimate Request
IsBot-->>Middleware: false
Middleware->>API: next() → Continue to API Handler
API-->>Client: 200 Response
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1403 +/- ##
=======================================
Coverage 87.40% 87.40%
=======================================
Files 116 116
Lines 4392 4392
Branches 1018 1018
=======================================
Hits 3839 3839
Misses 553 553 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Deploying repomix with
|
| Latest commit: |
a82baa9
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://a03ecdf0.repomix.pages.dev |
| Branch Preview URL: | https://fix-block-bot-pack-requests.repomix.pages.dev |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the review! Actionable items:
Suggestions:
|
This comment has been minimized.
This comment has been minimized.
a1de721 to
4ba5f1c
Compare
This comment has been minimized.
This comment has been minimized.
Code Review (Update 2)Previous feedback addressed in latest commit (4ba5f1c):
New findings on latest code1. Consider narrowing scope from Currently Not blocking — the current scope is a reasonable conservative default. 2. Node.js single-thread: no real race condition on countersSeveral bot reviewers flagged the shared 3.
|
Applebot and other JS-capable crawlers were visiting permalink URLs (repomix.com/?repo=xxx), executing the frontend JS which auto-triggers POST /api/pack on mount. This caused massive parallel git clone operations that exceeded the 1024 MiB memory limit on Cloud Run, resulting in OOM crash loops. - Add server-side botGuardMiddleware using `isbot` package to reject bot requests to /api/* with 403 before they consume resources - Add frontend bot detection to skip auto-pack execution in onMounted when the user agent is a known crawler - Place bot guard before rate limiter to avoid counting bot requests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
isbot is only needed in website/server, not in the root package. Remove test files since website has no test infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Include the number of blocked requests in the log message so operators can gauge bot traffic volume without log flooding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move throttle state inside factory function (gemini) - Rename inner function to botGuardHandler to avoid shadowing (gemini) - Add requestId fallback to 'unknown' for undefined case (coderabbit) - Remove bare 'bot'/'spider'/'crawler' from client regex to prevent false positives on legitimate devices like Cubot phones (devin) - Update server package-lock.json with isbot dependency (devin) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hand-rolled bot regex with the isbot package (~6.5 KB ESM, zero deps) to match server-side detection. Eliminates divergence between client and server bot detection logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6102e02 to
a82baa9
Compare
Code Review — Final PassPrevious feedback has been addressed across 6 follow-up commits. One new item: Minor: PR description is now stale The description says the client "Uses a simple regex pattern (not the full isbot package) to minimize bundle size impact", but commit c82accf switched the client to the full isbot package. Worth a quick edit to the description for accuracy before merging. Previously raised items — status
Verdict: Ready to merge. The defense-in-depth approach is sound, isbot is the right package choice, middleware ordering is correct, and all actionable feedback has been addressed. No new blocking issues found. Generated with Claude Code |
Applebot (and other JS-capable crawlers) were visiting permalink URLs (
repomix.com/?repo=xxx), executing the frontend JavaScript which auto-triggersPOST /api/packon mount. This caused massive parallelgit cloneoperations that exceeded the 1024 MiB memory limit on Cloud Run, resulting in OOM crash loops.Changes
Server-side (primary defense):
botGuardMiddlewareusing theisbotpackage (~5M weekly downloads, industry standard) to detect bot User-Agents/api/*are rejected with 403 before consuming any resourcesFrontend (secondary defense):
isBot()check inTryIt.vue'sonMountedto skip auto-pack execution when the user agent is a known crawlerisbotpackage) to minimize bundle size impactChecklist
npm run testnpm run lint🤖 Generated with Claude Code