Skip to content

Commit cac5965

Browse files
committed
docs(deploy): linesf/ short-circuit in nginx — prevent the 404-flood→auto-ban outage
When linesf/ (4.4GB fuzzy index) isn't deployed, the client probes many linesf/{bucket}.json per fuzzy query — all 404. A log-mining auto-blocklist read that burst as a 404-flood scanner and banned real visitors → full-site 403 outage (2026-06-13). Add 'location ^~ /data/linesf/ { access_log off; return 404; }' to the deploy template (host-agnostic) and check in nginx.cohenjikan.conf (the actual prod vhost adaptation, sans ngx_brotli) so a future redeploy can't regress it. Client behaviour unchanged (load.ts still no-ops on 404; commit 9254250 now also latches after the first one).
1 parent 9254250 commit cac5965

2 files changed

Lines changed: 70 additions & 0 deletions

File tree

deploy/nginx.cohenjikan.conf

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# 诗云 / shiyun.cohenjikan.com — adapted for cohen #1 (no ngx_brotli module → gzip_static only).
2+
# Managed-by-hand companion to deploy/nginx.conf. Certbot will add the ssl_certificate lines.
3+
server {
4+
server_name shiyun.cohenjikan.com;
5+
root /var/www/shiyun/dist;
6+
index index.html;
7+
8+
gzip_static on; # serve precompressed file.gz when client accepts (precompress.mjs wrote them)
9+
gzip on; # else compress on the fly
10+
gzip_vary on;
11+
gzip_min_length 1024;
12+
gzip_types application/json application/javascript text/css image/svg+xml text/plain;
13+
charset utf-8;
14+
15+
add_header X-Content-Type-Options "nosniff";
16+
server_tokens off;
17+
18+
# SPA: 诗云 is a hash-router (#a=… / #p=…), so every path just serves index.html.
19+
location / {
20+
try_files $uri $uri/ /index.html;
21+
}
22+
location = /index.html {
23+
add_header Cache-Control "no-cache";
24+
}
25+
# content-hashed bundles → cache forever
26+
location /assets/ {
27+
add_header Cache-Control "public, max-age=31536000, immutable";
28+
}
29+
30+
# ── Per-poet Range fetch (egress saver) ─────────────────────────────────────
31+
# poems/{bucket}.json MUST be served RAW: byte offsets in poems/{bucket}.idx.json index the
32+
# UNCOMPRESSED file, so `Range: bytes=off-end` must return exactly those bytes (status 206).
33+
# Disable compression here (a compressed Range would slice the wrong bytes). Range on by default.
34+
location /data/poems/ {
35+
gzip_static off;
36+
gzip off;
37+
add_header Accept-Ranges bytes;
38+
add_header Cache-Control "public, max-age=86400";
39+
}
40+
# ── linesf/ short-circuit (operational, NOT optional) ───────────────────────
41+
# The 4.4GB fuzzy-search index (linesf/) is intentionally NOT deployed; the front-end fetches these
42+
# shards and gracefully no-ops on 404 (load.ts loadFzShard). Return 404 cheaply WITHOUT touching disk
43+
# AND keep these EXPECTED 404s out of the access log — otherwise every real searcher's burst of
44+
# linesf 404s looks like a "404-flood scanner" to a log-mining auto-blocklist and gets the visitor
45+
# banned (this caused a full-site 403 outage on 2026-06-13). Front-end behaviour is unchanged.
46+
location ^~ /data/linesf/ {
47+
access_log off;
48+
return 404;
49+
}
50+
# lines/ + small jsons are fetched whole → compress them normally (big win on lines/).
51+
location /data/ {
52+
gzip_static on;
53+
add_header Cache-Control "public, max-age=86400";
54+
}
55+
56+
listen 80;
57+
listen [::]:80;
58+
}

deploy/nginx.conf

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,18 @@ server {
5959
add_header Accept-Ranges bytes;
6060
add_header Cache-Control "public, max-age=86400";
6161
}
62+
# ── linesf/ short-circuit — keep EXPECTED 404s out of the access log (operational, not optional) ────
63+
# linesf/ is the ~4.4 GB delete-1 fuzzy index. If you DON'T deploy it (common — it's a search fallback
64+
# and load.ts no-ops on 404), the client still probes many linesf/{bucket}.json per fuzzy query, all
65+
# 404. Two reasons to short-circuit them here: (1) a log-mining auto-blocklist will read that burst of
66+
# 404s as a "404-flood scanner" and ban the REAL visitor — this caused a full-site 403 outage on
67+
# 2026-06-13; (2) they're needless disk stat()s + (behind a CDN) origin pulls. `return 404` keeps the
68+
# client's graceful-degrade path identical; `access_log off` keeps them out of any security log-miner.
69+
# If you DO deploy linesf/, delete this block so the shards are served normally.
70+
location ^~ /data/linesf/ {
71+
access_log off;
72+
return 404;
73+
}
6274
# lines/ + the small jsons are fetched whole → compress them normally (big win on lines/, ~791 MB raw).
6375
location /data/ {
6476
add_header Cache-Control "public, max-age=86400";

0 commit comments

Comments
 (0)