Skip to content

fix(server): Block bot/crawler requests to prevent OOM crashes#1403

Merged
yamadashy merged 6 commits into
mainfrom
fix/block-bot-pack-requests
Apr 5, 2026
Merged

fix(server): Block bot/crawler requests to prevent OOM crashes#1403
yamadashy merged 6 commits into
mainfrom
fix/block-bot-pack-requests

Conversation

@yamadashy

@yamadashy yamadashy commented Apr 5, 2026

Copy link
Copy Markdown
Owner

Applebot (and other JS-capable crawlers) were visiting permalink URLs (repomix.com/?repo=xxx), executing the frontend JavaScript which auto-triggers POST /api/pack on mount. This caused massive parallel git clone operations that exceeded the 1024 MiB memory limit on Cloud Run, resulting in OOM crash loops.

Changes

Server-side (primary defense):

  • Add botGuardMiddleware using the isbot package (~5M weekly downloads, industry standard) to detect bot User-Agents
  • Bot requests to /api/* are rejected with 403 before consuming any resources
  • Placed before rate limiter to avoid counting bot requests against user limits
  • Log throttling (60s interval) to prevent log storms from heavy bot traffic

Frontend (secondary defense):

  • Add lightweight isBot() check in TryIt.vue's onMounted to skip auto-pack execution when the user agent is a known crawler
  • Uses a simple regex pattern (not the full isbot package) to minimize bundle size impact

Checklist

  • Run npm run test
  • Run npm run lint

🤖 Generated with Claude Code


Open with Devin

@github-actions

github-actions Bot commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

⚡ Performance Benchmark

Latest commit:a82baa9 fix(website): Update client package-lock.json with isbot dependency
Status:✅ Benchmark complete!
Ubuntu:1.58s (±0.03s) → 1.59s (±0.02s) · +0.01s (+0.4%)
macOS:0.88s (±0.04s) → 0.90s (±0.05s) · +0.02s (+2.0%)
Windows:1.97s (±0.47s) → 1.98s (±0.52s) · +0.01s (+0.5%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run
History

0ef02a9 refactor(website): Use isbot package on client side for consistency

Ubuntu:1.53s (±0.03s) → 1.55s (±0.03s) · +0.01s (+0.9%)
macOS:1.43s (±0.21s) → 1.39s (±0.28s) · -0.05s (-3.2%)
Windows:2.08s (±0.86s) → 2.43s (±0.62s) · +0.34s (+16.6%)

4ba5f1c fix(server): Address PR review feedback

Ubuntu:1.54s (±0.02s) → 1.53s (±0.03s) · -0.01s (-0.3%)
macOS:1.35s (±0.18s) → 1.36s (±0.32s) · +0.01s (+1.0%)
Windows:1.87s (±0.02s) → 1.86s (±0.03s) · -0.01s (-0.4%)

a1de721 fix(server): Address PR review feedback

Ubuntu:1.55s (±0.03s) → 1.55s (±0.02s) · +0.00s (+0.1%)
macOS:1.07s (±0.19s) → 1.06s (±0.10s) · -0.01s (-1.0%)
Windows:1.87s (±0.02s) → 1.87s (±0.03s) · -0.00s (-0.2%)

d74986a fix(server): Add block count to bot guard throttled logs

Ubuntu:1.52s (±0.01s) → 1.52s (±0.01s) · -0.00s (-0.1%)
macOS:0.87s (±0.03s) → 0.87s (±0.04s) · +0.00s (+0.0%)
Windows:1.89s (±0.04s) → 1.90s (±0.04s) · +0.01s (+0.5%)

0e85849 fix(server): Remove isbot from root deps and drop server tests

Ubuntu:1.62s (±0.02s) → 1.63s (±0.03s) · +0.01s (+0.5%)
macOS:1.28s (±0.08s) → 1.26s (±0.11s) · -0.02s (-1.5%)
Windows:1.88s (±0.06s) → 1.87s (±0.03s) · -0.00s (-0.2%)

0f30226 fix(server): Block bot/crawler requests to prevent OOM crashes

Ubuntu:1.51s (±0.02s) → 1.52s (±0.03s) · +0.01s (+0.5%)
macOS:1.20s (±0.21s) → 1.22s (±0.22s) · +0.02s (+1.6%)
Windows:2.02s (±0.18s) → 2.03s (±0.32s) · +0.01s (+0.6%)

@coderabbitai

coderabbitai Bot commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2d0ffb6e-2710-42b8-8708-75390cce2628

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes implement bot detection and blocking functionality across client and server. A new isbot library dependency detects automated requests. The server adds middleware to block API requests from known bots with HTTP 403 responses. The client prevents automatic API calls for detected bot user agents. Both include comprehensive tests for the new detection logic.

Changes

Cohort / File(s) Summary
Dependencies
package.json, website/server/package.json
Added isbot library v5.1.36 to devDependencies and server dependencies.
Bot Detection Utilities
website/client/utils/botDetect.ts
Implemented isBot() function that detects common bot and crawler user-agent strings using regex pattern matching; returns false in non-browser environments.
Server Bot Guard Middleware
website/server/src/middlewares/botGuard.ts, website/server/src/index.ts
Added botGuardMiddleware to /api/* route chain that blocks requests from detected bots with HTTP 403; includes throttled warning logs to prevent log flooding (60-second intervals).
Client Integration
website/client/components/Home/TryIt.vue
Modified auto-execution condition to prevent automatic API calls when isBot() returns true for crawler/bot user agents.
Tests
tests/server/middlewares/botGuard.test.ts, tests/server/utils/botDetect.test.ts
Added comprehensive test suites covering bot detection patterns (Applebot, Googlebot, GPTBot, ClaudeBot), middleware blocking behavior, and legitimate user-agent allowlist; tests validate non-API route bypass.

Sequence Diagram

sequenceDiagram
    participant Client as Client Request
    participant Middleware as botGuardMiddleware
    participant IsBot as isbot Library
    participant Logger as Request Logger
    participant API as API Handler

    Client->>Middleware: HTTP Request (with User-Agent)
    Middleware->>Middleware: Extract User-Agent via getClientInfo()
    Middleware->>IsBot: isbot(userAgent)
    
    alt Bot Detected
        IsBot-->>Middleware: true
        Middleware->>Logger: logWarning (throttled, 60s interval)
        Middleware-->>Client: 403 JSON Error Response
    else Legitimate Request
        IsBot-->>Middleware: false
        Middleware->>API: next() → Continue to API Handler
        API-->>Client: 200 Response
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding bot/crawler request blocking to prevent OOM crashes. It is specific, concise, and directly related to the primary objective.
Description check ✅ Passed The description is comprehensive and well-structured, covering the problem, server-side and frontend solutions, design rationale, and includes the required checklist with both items marked complete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/block-bot-pack-requests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Apr 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.40%. Comparing base (a96b212) to head (0ef02a9).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1403   +/-   ##
=======================================
  Coverage   87.40%   87.40%           
=======================================
  Files         116      116           
  Lines        4392     4392           
  Branches     1018     1018           
=======================================
  Hits         3839     3839           
  Misses        553      553           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Apr 5, 2026

Copy link
Copy Markdown

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: a82baa9
Status: ✅  Deploy successful!
Preview URL: https://a03ecdf0.repomix.pages.dev
Branch Preview URL: https://fix-block-bot-pack-requests.repomix.pages.dev

View logs

gemini-code-assist[bot]

This comment was marked as resolved.

@claude

This comment has been minimized.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@yamadashy

Copy link
Copy Markdown
Owner Author

Thanks for the review!

Actionable items:

  1. isbot removed from root package.json — Done in 0e85849. It's only in website/server/package.json now.
  2. Test file location — Removed the tests/server/ directory entirely in 0e85849 since the website side has no test infrastructure.
  3. Log throttle with counter — Implemented in d74986a. Now logs "Blocked N bot request(s) since last log" with the count, then resets.

Suggestions:

  • robots.txt: Good point for well-behaved crawlers. However, the immediate issue was Applebot executing JS on repomix.com/?repo=xxx pages (not hitting the API directly), and robots.txt query-string matching is limited. The isbot middleware is the reliable enforcement point. Can consider adding robots.txt separately.
  • Client-side regex vs isbot: Agreed on potential divergence. Keeping the lightweight regex on the client since the server guard is the real enforcement point and bundle size matters.

@claude

This comment has been minimized.

devin-ai-integration[bot]

This comment was marked as resolved.

@yamadashy yamadashy force-pushed the fix/block-bot-pack-requests branch from a1de721 to 4ba5f1c Compare April 5, 2026 13:34
@claude

This comment has been minimized.

@claude

claude Bot commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Code Review (Update 2)

Previous feedback addressed in latest commit (4ba5f1c):

  • ✅ Inner function renamed to botGuardHandler (no more shadowing)
  • requestId fallback to 'unknown' added
  • ✅ Bare 'bot'/'spider' removed from client regex (prevents Cubot false positives)
  • ✅ Throttle state already scoped inside factory function

New findings on latest code

1. Consider narrowing scope from /api/* to /api/pack

Currently botGuardMiddleware() is applied to all /api/* routes. The OOM problem is specific to /api/pack (which triggers git clone). If lightweight API endpoints are added later, bot-blocking them would be unnecessarily restrictive. Narrowing to /api/pack would be more precise.

Not blocking — the current scope is a reasonable conservative default.

2. Node.js single-thread: no real race condition on counters

Several bot reviewers flagged the shared botBlockCount/lastBotBlockLogAt as a concurrency hazard. This is a false positive — Node.js runs JavaScript on a single thread. The ++ and timestamp check/reset execute atomically within a single tick of the event loop. No synchronization is needed.

3. isbot(undefined) behavior is safe but implicit

clientInfo.userAgent is typed as string | undefined. The isbot library returns false for undefined/null inputs, so missing User-Agent headers are correctly allowed through. This is the right behavior (missing UA ≠ bot), but worth noting it's relying on library behavior rather than explicit handling.

Previously noted (still applicable)

  • Middleware ordering: cloudLoggerMiddleware still runs before botGuardMiddleware, so bot requests pay the full logging cost (URL parsing, header reads, memory snapshots, two structured log entries). Under heavy crawl traffic this adds overhead. Moving bot guard earlier or scoping logger to post-guard would help, but this is a performance optimization, not a correctness issue.

  • No test coverage: Structural gap — website has no test infrastructure. The middleware is pure logic and testable without DOM dependencies, so this would be low-cost to add if/when test infra is set up.

Verdict

Approve — The PR correctly solves a real production OOM issue with a well-implemented defense-in-depth approach. The isbot package is the right tool, middleware ordering is correct for the rate limiter, log throttling with block counts provides good operator visibility, and the latest commit addresses all actionable review feedback.

Suggested follow-ups (non-blocking):

  • robots.txt / X-Robots-Tag for permalink pages
  • Narrow bot guard scope to /api/pack if more endpoints are added
  • Website test infrastructure

🤖 Generated with Claude Code

yamadashy and others added 6 commits April 5, 2026 22:44
Applebot and other JS-capable crawlers were visiting permalink URLs
(repomix.com/?repo=xxx), executing the frontend JS which auto-triggers
POST /api/pack on mount. This caused massive parallel git clone
operations that exceeded the 1024 MiB memory limit on Cloud Run,
resulting in OOM crash loops.

- Add server-side botGuardMiddleware using `isbot` package to reject
  bot requests to /api/* with 403 before they consume resources
- Add frontend bot detection to skip auto-pack execution in onMounted
  when the user agent is a known crawler
- Place bot guard before rate limiter to avoid counting bot requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
isbot is only needed in website/server, not in the root package.
Remove test files since website has no test infrastructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Include the number of blocked requests in the log message so operators
can gauge bot traffic volume without log flooding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move throttle state inside factory function (gemini)
- Rename inner function to botGuardHandler to avoid shadowing (gemini)
- Add requestId fallback to 'unknown' for undefined case (coderabbit)
- Remove bare 'bot'/'spider'/'crawler' from client regex to prevent
  false positives on legitimate devices like Cubot phones (devin)
- Update server package-lock.json with isbot dependency (devin)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hand-rolled bot regex with the isbot package (~6.5 KB ESM,
zero deps) to match server-side detection. Eliminates divergence
between client and server bot detection logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yamadashy yamadashy force-pushed the fix/block-bot-pack-requests branch from 6102e02 to a82baa9 Compare April 5, 2026 13:44
@claude

claude Bot commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Code Review — Final Pass

Previous feedback has been addressed across 6 follow-up commits. One new item:

Minor: PR description is now stale

The description says the client "Uses a simple regex pattern (not the full isbot package) to minimize bundle size impact", but commit c82accf switched the client to the full isbot package. Worth a quick edit to the description for accuracy before merging.

Previously raised items — status

Item Status
Root package.json cleanup Addressed
Log throttle with block count Addressed
Inner function naming (botGuardHandler) Addressed
Client-side regex to isbot package Addressed
cloudLoggerMiddleware ordering Non-blocking, acknowledged
clientInfo.userAgent undefined Non-blocking, isbot(undefined) returns false
Website test infrastructure Pre-existing gap, not PR-specific

Verdict: Ready to merge. The defense-in-depth approach is sound, isbot is the right package choice, middleware ordering is correct, and all actionable feedback has been addressed. No new blocking issues found.

Generated with Claude Code

@yamadashy yamadashy merged commit 4a0eb98 into main Apr 5, 2026
22 checks passed
@yamadashy yamadashy deleted the fix/block-bot-pack-requests branch April 5, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant