-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Mitigate Bot/DDOS on Expensive Endpoints via Human Verification Challenge for Suspicious Visitors#11806
Feature
Copy link
Labels
Theme: BotsIssues relating to Bots & data cleanupIssues relating to Bots & data cleanupTheme: Security
Description
Problem
We are currently experiencing DDOS traffic from bots targeting our expensive endpoints, especially /search?q= (with complex solr syntax, e.g. language:...) and /subjects/... pages. These attacks cause significant resource drain.
Recent work (@internetarchive/openlibrary/pull/11621) improved bot detection for analytics, but our endpoints remain vulnerable to bot-driven resource consumption.
Proposal
Implement an additional human verification challenge for "suspicious visitors" (not logged in, with generic user-agents that aren't recognized bots) when:
- Performing a GET to
/search?q=using specialized solr syntax (such asq=language:eng), or - Accessing a
/subjects/page for the first time
Flow:
- If the above conditions are met, and a
vf=1cookie is not present:- Instead of rendering the resource, show a
templates/accounts/challenge.htmlpage (or suitable alternative) with a minimal body and a standard Open Library button:Verify you are human. - When clicked, the button hits a new API endpoint (e.g.
/account/verify_human), which sets thevf=1cookie and reloads the page.
- Instead of rendering the resource, show a
- Share challenge logic between Search and Subjects for DRYness (suggest a shared bounce/check function; decorator may be overkill).
- Template must follow i18n and remain minimal.
- JS may be in a script tag within the template or implemented canonically in
plugins/openlibrary/js. - Add a basic statsd metric (e.g.
ol.stats.verify_human, following our current statsd patterns) to track challenge flow usage.
Definition
A suspicious_visitor is:
- Not logged in
- Presents a generic or inspecific user-agent (not a known bot UA); see
is_botin plugins/openlibrary/code.py
Implementation goals
- Simple, secure, easy to test and ship
- DRY: minimal overhead and code duplication
- Backend challenge logic, with
suspicious_visitordetection based on upstream nginx JS rules (inferring this logic is out-of-scope)
See also
- PR #11621 which shows how we've previously split out human and bot traffic.
Acceptance Criteria
- Suspicious, unverified visitors hitting
/search?q=with specialized solr syntax (likelanguages:...) or expensive requests for/subjects/*see a human verification page ifvf=1is not set - Verified users (via challenge or login/cookie) are not re-prompted
- Metrics for verification flows are recorded via statsd
- Search and Subjects remain DRY.
- Solution tested and ready for production.
It's possible for human verification, we don't really care about the nginx js code -- we really care that:
- the endpoint is a candidate (e.g. expensive subject or /search?q= with specialized solr syntax or parameters)
- they are not logged in
- no
vfcookie set - not a known
is_bot
Stakeholders
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Theme: BotsIssues relating to Bots & data cleanupIssues relating to Bots & data cleanupTheme: Security