Skip to content

Commit b0d3a6b

Browse files
authored
Merge pull request #2 from artefactual-labs/feat/botd-ingestion
feat: integrate BotD verdict ingestion
2 parents 9b6d2a7 + 6106296 commit b0d3a6b

File tree

12 files changed

+2248
-56
lines changed

12 files changed

+2248
-56
lines changed

.goreleaser.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,10 @@ nfpms:
6868
- src: web/assets/altcha
6969
dst: /etc/haproxy/assets/altcha
7070
type: tree
71+
# BotD JS asset (versioned) and VERSION file; served via /assets/botd/active/botd.esm.js
72+
- src: web/assets/botd
73+
dst: /etc/haproxy/assets/botd
74+
type: tree
7175
dependencies:
7276
- haproxy
7377
scripts:

Makefile

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
BIN := bin/cookie-guard-spoa
22

3-
.PHONY: all build clean altcha-assets install-altcha-assets altcha-go-bump
3+
.PHONY: all build clean altcha-assets install-altcha-assets altcha-go-bump botd-assets install-botd-assets
44

55
all: build
66

@@ -13,8 +13,9 @@ build:
1313
clean:
1414
rm -rf bin dist
1515

16-
# ----- ALTCHA assets management -----
16+
# ----- ALTCHA / BotD assets management -----
1717
ALTCHA_VER := $(shell sed -n '1p' web/assets/altcha/VERSION 2>/dev/null || echo unset)
18+
BOTD_VER := $(shell sed -n '1p' web/assets/botd/VERSION 2>/dev/null || echo unset)
1819

1920
altcha-assets:
2021
@[ -x tools/altcha-sync.sh ] || chmod +x tools/altcha-sync.sh || true
@@ -24,6 +25,14 @@ install-altcha-assets:
2425
install -d -m0755 /etc/haproxy/assets/altcha
2526
cp -a web/assets/altcha/* /etc/haproxy/assets/altcha/
2627

28+
botd-assets:
29+
@[ -x tools/botd-sync.sh ] || chmod +x tools/botd-sync.sh || true
30+
./tools/botd-sync.sh $(BOTD_VER)
31+
32+
install-botd-assets:
33+
install -d -m0755 /etc/haproxy/assets/botd
34+
cp -a web/assets/botd/* /etc/haproxy/assets/botd/
35+
2736
altcha-go-bump:
2837
@if [ -z "$(VERSION)" ]; then echo "usage: make altcha-go-bump VERSION=vX.Y.Z"; exit 2; fi
2938
go get github.com/altcha-org/altcha-lib-go@$(VERSION)

README.md

Lines changed: 149 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,12 @@
22

33
`cookie-guard-spoa` is an HAProxy SPOE (Stream Processing Offload Engine) agent that issues and validates HMAC‑signed cookies.
44

5-
It ships with a privacy‑friendly browser challenge powered by ALTCHA and dedicated endpoints built into the agent. HAProxy serves a small HTML page and the agent verifies the puzzle solution, issuing the `hb_v2` cookie on success. Only clients presenting a valid cookie reach your backend.
5+
It ships with two first-party protections enabled by default:
6+
7+
- **ALTCHA** – a lightweight, open-source puzzle that proves the visitor can execute JavaScript, persist cookies, and solve a human-friendly challenge before the origin ever sees the request.
8+
- **BotD (FingerprintJS)** – a local copy of the BotD detector that fingerprints the browser for automation traits (headless Chrome, Selenium drivers, emulators) and reports the verdict back to Cookie Guard, allowing HAProxy or downstream SPOEs to block, throttle, or log suspect sessions.
9+
10+
HAProxy serves a small HTML page that embeds both protections. The agent verifies the ALTCHA solution, ingests the BotD verdict, and issues the `hb_v2` cookie on success. Only clients presenting a valid cookie reach your backend unless you explicitly disable the challenge or BotD via CLI flags.
611

712
Learn more about ALTCHA:
813

@@ -16,14 +21,14 @@ Learn more about ALTCHA:
1621

1722
## Overview
1823

19-
The agent offloads cookie lifecycle management from HAProxy:
24+
Cookie Guard inserts an inline checkpoint between HAProxy and your origin that:
2025

21-
1. Generates short-lived, signed cookies derived from the client IP and User-Agent.
22-
2. Exposes helper endpoints that HAProxy can embed in a challenge page.
23-
3. Validates cookies on subsequent requests and reports the outcome back to HAProxy via SPOE frames.
24-
4. Enables HAProxy to allow, rate-limit, or block requests that fail the validation.
26+
1. **Challenges new sessions** – serves the bundled ALTCHA puzzle and locally hosted BotD detector so only browsers that can execute JavaScript, persist cookies, and pass automation fingerprinting obtain an `hb_v2` cookie.
27+
2. **Issues and tracks tokens** – mints short-lived, HMAC-signed cookies bound to the client IP and (optionally) User-Agent, then caches recent BotD verdicts for the same tuple.
28+
3. **Validates on subsequent requests** – verifies hb_v2 on every request via SPOE and reuses the cached BotD verdict so downstream policies can treat “good”, “suspect”, or “bad” sessions differently.
29+
4. **Feeds HAProxy/SPOE peers** – exposes fresh transaction variables (`cookieguard.valid`, `cookieguard.botd_kind`, `cookieguard.session_hmac`, etc.) that HAProxy, Decision-SPOA, or other agents can use to block, rate-limit, or log.
2530

26-
This setup filters out most headless bots, generic scanners, or curl-based tooling that cannot execute JavaScript or persist cookies.
31+
Because the HTML and JavaScript are served from your own HAProxy backend, no third-party calls or trackers are involved. The combination of ALTCHA (prove you are interactive) and BotD (fingerprint automation) removes most headless browsers, cURL scripts, and basic scrapers before they ever see your real site.
2732

2833
---
2934

@@ -101,6 +106,7 @@ Additionally, packages include the challenge pages and ALTCHA assets under `/etc
101106

102107
- `/etc/haproxy/altcha_challenge.html.lf`.
103108
- ALTCHA JS is installed under `/etc/haproxy/assets/altcha/<version>/altcha.min.js[.lf]` with `/etc/haproxy/assets/altcha/active` symlink updated to the packaged version.
109+
- BotD JS is installed under `/etc/haproxy/assets/botd/<version>/botd.esm.js[.lf]` with `/etc/haproxy/assets/botd/active` baked into the package so the challenge page can import `/assets/botd/active/botd.esm.js` immediately.
104110

105111
After installation, adjust `/etc/cookie-guard-spoa/secret.key` or edit the systemd unit as needed, then `systemctl restart cookie-guard-spoa`.
106112

@@ -123,61 +129,118 @@ After editing, run `systemctl restart cookie-guard-spoa`.
123129

124130
## HAProxy integration
125131

126-
1. **SPOE engine definition** (`/etc/haproxy/cookie-guard.cfg`)
132+
### SPOE engine definition (`/etc/haproxy/cookie-guard-spoa.cfg`)
127133

128-
```ini
129-
[spoe]
130-
max-frame-size 16384
131-
max-waiting-frames 2000
134+
```ini
135+
[cookie-guard]
136+
spoe-agent cookie-guard
137+
option var-prefix cookieguard
138+
groups issue-token verify-token
139+
option pipelining
140+
timeout hello 2s
141+
timeout idle 30s
142+
timeout processing 2s
143+
use-backend cookie_guard_spoa_backend
132144

133-
agent cookie_guard
134-
use-backend cookie_guard_backend
135-
messages issue-token verify-token
136-
option pipelining
137-
timeout hello 2s
138-
timeout idle 30s
139-
timeout processing 2s
145+
spoe-message issue-token
146+
args src-ip=src ua="req.fhdr(User-Agent)"
140147

141-
message issue-token
142-
args src-ip=ip.src ua="req.fhdr(User-Agent)"
148+
spoe-message verify-token
149+
args src-ip=src ua="req.fhdr(User-Agent)" cookie=req.cook(hb_v2)
143150

144-
message verify-token
145-
args src-ip=ip.src ua="req.fhdr(User-Agent)" cookie=req.cook(hb_v2)
146-
```
151+
spoe-group issue-token
152+
messages issue-token
147153

148-
2. **Backend connection**
154+
spoe-group verify-token
155+
messages verify-token
156+
```
149157

150-
```haproxy
151-
backend cookie_guard_backend
152-
mode tcp
153-
server spoa1 127.0.0.1:9903 check
154-
```
158+
```haproxy
159+
backend cookie_guard_spoa_backend
160+
mode tcp
161+
server spoa1 127.0.0.1:9903 check inter 2s fall 2 rise 1
162+
```
155163

156-
3. **Example application backend**
164+
### Reference frontend/backends
157165

158-
```haproxy
159-
backend be_app
160-
option http-buffer-request
166+
Below is a compact `public_www` setup that wires Cookie Guard alone in front of a single backend. Swap the binds/hosts for your environment and layer in additional SPOEs (Decision, Coraza, etc.) later once the basic flow works.
161167

162-
acl chal_safe_meth method GET HEAD
163-
acl chal_exempt_path path_beg -i /health /status /static/ /favicon.ico
164-
acl chal_exempt_cookie req.cook(hb_v2) -m found
165-
acl chal_target chal_safe_meth !chal_exempt_path
168+
```haproxy
169+
frontend public_www
170+
bind :80
171+
bind :443 ssl crt /etc/haproxy/certs/example.pem alpn h2,http/1.1
172+
option httplog
173+
174+
# Force HTTPS except for ACME
175+
acl is_certbot path_beg -i /.well-known/acme-challenge
176+
http-request redirect scheme https unless { ssl_fc } || is_certbot
177+
178+
# Cookie Guard: verify hb_v2 only when present
179+
filter spoe engine cookie-guard config /etc/haproxy/cookie-guard-spoa.cfg
180+
option http-buffer-request
181+
acl has_cookie req.cook(hb_v2) -m found
182+
http-request send-spoe-group cookie-guard verify-token if has_cookie
183+
acl cookie_ok var(txn.cookieguard.valid) -m str 1
184+
185+
# Route challenge assets and BotD reports back to the Cookie Guard HTTP listener
186+
acl altcha_routes path_beg -i /altcha /altcha- /assets/altcha/
187+
acl botd_path path -i /botd-report
188+
acl botd_js path -i /assets/botd/active/botd.esm.js
189+
use_backend cookie_guard_http_backend if altcha_routes or botd_path or botd_js
190+
191+
use_backend certbot if is_certbot
192+
default_backend app_backend
193+
```
166194

167-
http-request set-spoe-group cookie_guard verify-token if chal_target chal_exempt_cookie
168-
acl cookie_ok var(txn.cookie_guard.valid) -m str 1
195+
Backends reuse the same Cookie Guard SPOE engine. The snippet below illustrates challenge orchestration plus silent token issuance when Decision (or another policy component) is not involved yet. Feel free to inline your own exemption ACLs.
196+
197+
```haproxy
198+
backend app_backend
199+
option http-buffer-request
200+
201+
# Verify hb_v2 only when present
202+
filter spoe engine cookie-guard config /etc/haproxy/cookie-guard-spoa.cfg
203+
acl has_cookie req.cook(hb_v2) -m found
204+
http-request send-spoe-group cookie-guard verify-token if has_cookie
205+
acl cookie_ok var(txn.cookieguard.valid) -m str 1
206+
207+
# Simple policy: challenge every request until hb_v2 validates
208+
acl need_challenge !cookie_ok
209+
http-request redirect code 302 location /altcha?url=%[url] if need_challenge
210+
211+
# Auto-issue hb_v2 when you prefer a silent token (e.g., authenticated users)
212+
http-request send-spoe-group cookie-guard issue-token if !cookie_ok !need_challenge
213+
acl new_token var(txn.cookieguard.token) -m found
214+
http-response add-header Set-Cookie "hb_v2=%[var(txn.cookieguard.token)]; Max-Age=%[var(txn.cookieguard.max_age)]; Path=/; HttpOnly; Secure; SameSite=Lax" if !need_challenge !has_cookie new_token
215+
216+
# Forward headers to your origin
217+
http-request set-header X-Real-IP %[src]
218+
http-request add-header X-Forwarded-Proto https if { ssl_fc }
219+
option forwarded
220+
option forwardfor
221+
server app1 127.0.0.1:8080 check
222+
```
169223

170-
http-request set-spoe-group cookie_guard issue-token if chal_target !cookie_ok
224+
Cookie Guard’s HTTP listener serves the ALTCHA HTML, ALTCHA JS, BotD bundle, and `/botd-report`. Route traffic there using:
171225

172-
server app1 127.0.0.1:8080 check
173-
```
226+
```haproxy
227+
backend cookie_guard_http_backend
228+
mode http
229+
option forwarded
230+
option forwardfor
231+
http-request set-header X-Forwarded-For %[src]
232+
server spoa_http 127.0.0.1:9904 check
233+
```
174234

175-
When HAProxy runs the `verify-token` message, the agent populates the following transaction-scoped variables (prefixed via `option var-prefix cookieguard`):
235+
### What HAProxy gets back
176236

177-
- `txn.cookieguard.valid`: `"1"` when the hb_v2 cookie validates, otherwise `"0"`.
178-
- `txn.cookieguard.age_seconds`: age of the accepted cookie (stringified integer seconds).
179-
- `txn.cookieguard.session_hmac`: HMAC handle derived from the cookie value for downstream session tracking (empty when invalid or missing).
180-
- `txn.cookieguard.challenge_level`: textual label for the challenge that produced the cookie (currently `"altcha"` for hb_v2).
237+
When HAProxy runs the `verify-token` message, the agent populates transaction-scoped variables (prefixed by `option var-prefix cookieguard`):
238+
239+
- `txn.cookieguard.valid`: `"1"` when the hb_v2 cookie validates, otherwise `"0"`.
240+
- `txn.cookieguard.age_seconds`: age of the accepted cookie.
241+
- `txn.cookieguard.session_hmac`: deterministic handle for downstream correlation.
242+
- `txn.cookieguard.challenge_level`: label for the challenge that produced the cookie (`"altcha"` today).
243+
- `txn.cookieguard.botd_*`: BotD verdict metadata (`botd_verdict`, `botd_kind`, `botd_confidence`, `botd_request_id`; `botd_tool` aliases `botd_kind` for legacy rules).
181244

182245
## SPOE inputs and outputs
183246

@@ -205,6 +268,7 @@ With `option var-prefix cookieguard`, HAProxy sees the following variables under
205268
- `age_seconds` (stringified integer, always set): age of the accepted cookie. Remains "0" for invalid/missing cookies. You can rate-limit or log based on freshness.
206269
- `session_hmac` (hex string, optional): deterministic HMAC derived from the hb_v2 payload. Decision-SPOA uses this value as `cookieguard_session` to correlate sessions without exposing the token itself. Empty when validation fails.
207270
- `challenge_level` (string, optional): label describing how the cookie originated. Currently always `"altcha"` when verification succeeds; keep space for future challenge types.
271+
- `botd_verdict`/`botd_kind`/`botd_confidence`/`botd_request_id` (strings, optional): populated when a recent BotD report exists for the same client IP + UA hash. `botd_tool` remains as a backward-compatible alias of `botd_kind`. These let [decision-spoa](https://github.com/artefactual-labs/decision-spoa) or native HAProxy ACLs act on BotD detections without re-running the script.
208272

209273
By design, `verify-token` always resets every output to a safe default before attempting validation so stale data never leaks between transactions.
210274

@@ -251,7 +315,32 @@ By design, `verify-token` always resets every output to a safe default before at
251315
- The agent also serves the page at `/altcha` from `-altcha-page` (default `/etc/haproxy/altcha_challenge.html.lf`).
252316
- Packages enable `-cookie-secure` by default so `hb_v2` ships with the `Secure` attribute. Comment it in `/etc/default/cookie-guard-spoa` if you must disable it.
253317

254-
318+
5. **BotD verdict ingestion (optional)**
319+
320+
When `-botd` is enabled (default), the metrics listener exposes `POST /botd-report`. The shipped challenge page loads FingerprintJS BotD in the browser, detects automation, and POSTs the verdict before ALTCHA begins. The payload includes `verdict`, `botKind` (only set when automation is detected), `confidence`, `requestId`, and `ua_hash`. The agent caches each verdict for `-botd-ttl` (default `5m`) keyed by client IP and UA hash, exposes it via SPOE transaction variables (`botd_verdict`, `botd_kind`, `botd_confidence`, `botd_request_id`; `botd_tool` remains as an alias), and emits Prometheus metrics.
321+
322+
- `botd_confidence` mirrors Fingerprint’s 0–1 confidence score (the bundled OSS detector reports `0` for “no automation observed” and `1` for confirmed bots; the hosted SaaS may emit fractional probabilities).
323+
- `botd_request_id` surfaces Fingerprint’s request identifier when present, which is useful for correlating detections in their dashboards/logs. Browsers that run entirely locally usually leave it empty.
324+
325+
- Route `/botd-report` to the same backend that serves `/altcha*` so the agent receives reports.
326+
- Serve the bundled JS from `/assets/botd/active/botd.esm.js`; packages install it under `/etc/haproxy/assets/botd`, and `make botd-assets && sudo make install-botd-assets` refreshes the version.
327+
- Enable or disable the endpoint with `-botd`; set cache capacity with `-botd-cache-max` (use `0` to disable storage).
328+
Prometheus metrics:
329+
330+
- `cookie_guard_botd_reports_total{verdict="..."}` counts inbound reports.
331+
- `cookie_guard_botd_cache_entries` shows live cache cardinality.
332+
- `cookie_guard_botd_cache_evictions_total` increments when entries expire or capacity forces eviction.
333+
334+
Downstream policy engines (e.g., [decision-spoa](https://github.com/artefactual-labs/decision-spoa)) can read the new SPOE variables to make the final allow/challenge/block decision without changing cookie-guard’s core logic. Cookie Guard focuses on proving “is this a real, interactive browser?while Decision consumes the resulting `cookieguard.*` and `botd_*` variables (plus GeoIP/session context) to apply richer rules—together they form a layered defense that challenges unknown traffic, fingerprints automation, and then enforces nuanced policies.
335+
336+
## Optional integration with decision-spoa
337+
338+
[decision-spoa](https://github.com/artefactual-labs/decision-spoa) is Artefactual’s policy SPOE for HAProxy. Pairing it with Cookie Guard combines:
339+
340+
- **Cookie Guard** – first-party ALTCHA + BotD challenge, hb_v2 issuance/verification, and BotD verdict caching.
341+
- **Decision** – GeoIP lookups, session-rate tracking, JA3/UA heuristics, and a rule engine that consumes `cookieguard.*` / `botd_*` variables to choose block/allow/challenge routes.
342+
343+
Together they deliver a layered defense: Cookie Guard proves the visitor is an interactive browser and fingerprints automation; Decision ingests those signals plus its own telemetry to decide whether to serve the origin, throttle, or escalate.
255344

256345
6. **Frontend**
257346

@@ -333,6 +422,17 @@ Versioning policy and updates:
333422
systemctl reload haproxy
334423
```
335424

425+
### Update BotD
426+
427+
- JS asset: pinned under `web/assets/botd/<version>` with `web/assets/botd/active` symlinked to the active version. To bump:
428+
```bash
429+
echo vX.Y.Z > web/assets/botd/VERSION
430+
make botd-assets
431+
sudo make install-botd-assets
432+
systemctl reload haproxy
433+
```
434+
- Browser challenge: `web/altcha_challenge.html.lf` imports `/assets/botd/active/botd.esm.js`. Ensure HAProxy routes that path (and `/botd-report`) to the Cookie Guard HTTP listener so the new version is served immediately.
435+
336436
---
337437

338438
## Security notes

0 commit comments

Comments
 (0)