Skip to content

Commit 1d7ba2a

Browse files
author
The No Hands Company
committed
fix: Redis rate limiting, sync retry queue, migrate.ts, i18n HTTP backend, health checks
Redis infrastructure - lib/redis.ts: singleton client, optional (warns in prod if not set), graceful close - rateLimiter.ts: all 7 limiters now use shared Redis store via rate-limit-redis Falls back to in-memory with a production warning when REDIS_URL unset - docker-compose.yml: Redis 7-alpine service, health check, redis_data volume App service depends_on redis, REDIS_URL=redis://redis:6379 passed - index.ts: Redis connect at startup, closeRedis() in graceful shutdown - .env.example: REDIS_URL documented with examples - ioredis + rate-limit-redis added to api-server package.json Federation sync retry queue (lib/syncRetryQueue.ts) - Exponential backoff: 30s → 2m → 10m → 1h → 6h (capped), ±20% jitter - Max 10 attempts per siteDomain:targetNode pair, then abandoned with warning - Polls every 15 seconds, skips offline peers, cleans up deleted sites - deploy.ts: failed/error syncs now enqueued instead of silently dropped X-Federation-From header added to outgoing syncs - startSyncRetryQueue() / stopSyncRetryQueue() in index.ts lifecycle - getSyncQueueStats() exposed in GET /api/health Database migration runner (lib/db/src/migrate.ts) - Reads SQL files from migrations/ in alphabetical order - Tracks applied migrations in _migrations table - Full transaction rollback on failure — never partial schema - pnpm migrate script now points to migrate.ts via tsx/esm i18n async loading - Removed bundled en.json + id.json from JavaScript bundle - i18next-http-backend fetches translations via HTTP on demand - Translation files served from public/locales/{lng}/translation.json - Added public/locales/en/translation.json + public/locales/id/translation.json - i18next-http-backend added to frontend package.json Health endpoint improvements (routes/health.ts) - Redis status + latency added to services object - domainCache stats (entry counts, TTL, max sizes) - syncQueue stats (queued count, breakdown by target domain) Documentation - ROADMAP.md: 13 critical/high items updated from ⚠️/❌ to ✅ Remaining gaps: ACME stub, audit log, content dedup, Prometheus - HONEST_ASSESSMENT.md: resolved issues table added at bottom
1 parent c666310 commit 1d7ba2a

File tree

16 files changed

+1051
-48
lines changed

16 files changed

+1051
-48
lines changed

.env.example

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,3 +150,11 @@ DOMAIN_CACHE_TTL_MS=300000
150150
DOMAIN_CACHE_MAX=10000
151151
# Max file cache entries (default: 50000)
152152
FILE_CACHE_MAX=50000
153+
154+
# ── Redis (strongly recommended for production) ───────────────────────────────
155+
# Without Redis, rate limiting is per-process only (broken in multi-instance).
156+
# Redis also enables future session sharing and cache invalidation pub/sub.
157+
# Format: redis://[:password@]host[:port][/db-number]
158+
# Example: redis://redis:6379 (Docker Compose default)
159+
# Example: redis://user:pass@managed-redis.example.com:6380
160+
REDIS_URL=redis://redis:6379

ROADMAP.md

Lines changed: 37 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,16 @@ A living document tracking what is built, what is in progress, and what must be
2525
| Replit Auth (OIDC) || Browser flows work |
2626
| Ed25519 key pair generation + signing || Correct implementation |
2727
| `/.well-known/federation` discovery || |
28-
| Federation handshake + ping | ⚠️ | Replay attack window not enforced |
29-
| Node health monitor | ⚠️ | Marks offline after 1 failure — needs N=3 threshold |
30-
| Object storage (file upload/download) | ⚠️ | **Replit sidecar only — S3/MinIO non-functional** |
31-
| Site file serving (host-header routing) | ⚠️ | 2-3 DB queries per request, no caching |
28+
| Federation handshake + ping | | 5-minute timestamp window enforced |
29+
| Node health monitor | | N=3 consecutive failures required, exponential backoff |
30+
| Object storage (file upload/download) | | S3StorageProvider + ReplitStorageProvider, env-var selected |
31+
| Site file serving (host-header routing) | | LRU cache (10K domains, 50K files), invalidated on deploy |
3232
| Capacity tracking || |
33-
| Rate limiting | ⚠️ | **In-memory only — broken in multi-instance** |
33+
| Rate limiting | | Redis-backed when REDIS_URL set; warns in prod if missing |
3434
| Structured logging + error handling || Pino, AppError, stack traces redacted in prod |
3535
| Graceful shutdown || |
36-
| DB connection pool | ⚠️ | No max/timeout config |
37-
| Database migrations | | **Zero migration files — only `db push`** |
36+
| DB connection pool | | Explicit max/min/timeout config, error handler |
37+
| Database migrations | | 0000_initial_schema.sql + migrate.ts runner |
3838

3939
---
4040

@@ -51,7 +51,7 @@ A living document tracking what is built, what is in progress, and what must be
5151
| Onboarding flow || |
5252
| Node Marketplace || |
5353
| API Reference page || |
54-
| Bahasa Indonesia i18n | ⚠️ | Translations complete but bundled synchronously |
54+
| Bahasa Indonesia i18n | | HTTP backend (i18next-http-backend), loaded on demand from /locales/ |
5555
| React lazy loading || All 14 routes code-split |
5656

5757
---
@@ -63,7 +63,7 @@ A living document tracking what is built, what is in progress, and what must be
6363
| API tokens (Bearer auth) || SHA-256 hashed |
6464
| Site team members (owner/editor/viewer) || |
6565
| Site visibility (public/private) || |
66-
| Password-protected sites | ⚠️ | **Cookie not server-verified — security gap** |
66+
| Password-protected sites | | HMAC-signed cookie, timingSafeEqual verified |
6767
| Custom domain CNAME+TXT verification || |
6868
| Custom domain routing in host router || Subject to caching gap above |
6969

@@ -75,8 +75,8 @@ A living document tracking what is built, what is in progress, and what must be
7575
|---|---|---|
7676
| Site sync push (notify peers on deploy) || Ed25519 signed |
7777
| Federation manifest endpoint || Presigned URLs valid 1 hour |
78-
| Site sync pull (file replication) | ⚠️ | Works but no retry queue — failed syncs are lost |
79-
| Gossip-based peer discovery | ⚠️ | Works but no Redis sharing in multi-instance |
78+
| Site sync pull (file replication) | | Retry queue with exponential backoff (30s→2m→10m→1h→6h), max 10 attempts |
79+
| Gossip-based peer discovery | ⚠️ | Works; gossip peer list is in-memory per instance |
8080
| Same-domain conflict resolution || First-write-wins + pubkey tiebreaker |
8181
| Bootstrap node registry || |
8282

@@ -86,11 +86,11 @@ A living document tracking what is built, what is in progress, and what must be
8686

8787
| Feature | Status | Notes |
8888
|---|---|---|
89-
| Analytics buffer → hourly rollup | ⚠️ | **Bulk delete uses unsafe SQL — use inArray()** |
89+
| Analytics buffer → hourly rollup | | Uses inArray() — correct and safe |
9090
| Per-site analytics page || |
9191
| Network-wide analytics || |
92-
| Node operator admin dashboard | ⚠️ | **No RBAC — any authenticated user can access** |
93-
| Admin node settings | ⚠️ | No RBAC |
92+
| Node operator admin dashboard | | requireAdmin middleware, isAdmin DB flag + ADMIN_USER_IDS env var |
93+
| Admin node settings | | requireAdmin enforced |
9494
| Webhook notifications (Ed25519 signed) || |
9595

9696
---
@@ -104,7 +104,7 @@ A living document tracking what is built, what is in progress, and what must be
104104
| GitHub Actions deploy workflow || |
105105
| GitHub Actions CI (typecheck, lint, build) || |
106106
| GitHub Actions npm publish workflow || Needs `NPM_TOKEN` secret |
107-
| Docker Compose | ⚠️ | **App cannot talk to MinIO — storage abstraction broken** |
107+
| Docker Compose | | Redis + MinIO + S3StorageProvider wired, REDIS_URL passed to app |
108108
| Dockerfile (multi-stage) || |
109109

110110
---
@@ -113,34 +113,36 @@ A living document tracking what is built, what is in progress, and what must be
113113

114114
| Feature | Status | Notes |
115115
|---|---|---|
116-
| ACME/Let's Encrypt automation || **Issues challenge token only — never gets cert** |
116+
| ACME/Let's Encrypt automation || Stub — issues HTTP-01 token only; full ACME flow not implemented |
117117
| TLS via Caddy (documented) || Caddy instruction accurate |
118118
| Geographic routing (closest-node redirect) || Region inference + 302 redirect |
119119
| Geo routing: latency probing || Mentioned in code comment, not implemented |
120120

121121
---
122122

123-
## Must Fix Before Production (Priority Order)
123+
## Production Gaps Remaining
124124

125-
| # | Issue | Severity | Est. Work |
125+
| # | Issue | Severity | Status |
126126
|---|---|---|---|
127-
| 1 | Rewrite objectStorage.ts with real S3 support | CRITICAL | 2–3 weeks |
128-
| 2 | Generate + commit Drizzle migrations | CRITICAL | 1 day |
129-
| 3 | Redis for rate limiting + session sharing | CRITICAL | 3–5 days |
130-
| 4 | Fix unlock cookie verification (HMAC-signed) | HIGH | 1 day |
131-
| 5 | Add admin RBAC (isAdmin flag) | HIGH | 2 days |
132-
| 6 | Host router LRU cache (domain → siteId) | HIGH | 2 days |
133-
| 7 | DB pool configuration (max, timeouts) | MEDIUM | 2 hours |
134-
| 8 | Session expiry cleanup job | MEDIUM | 2 hours |
135-
| 9 | Fix analytics bulk delete → `inArray()` | MEDIUM | 30 min |
136-
| 10 | Health monitor N=3 consecutive failure threshold | MEDIUM | 2 hours |
137-
| 11 | Replay attack: enforce 5-min timestamp window | MEDIUM | 2 hours |
138-
| 12 | Mark ACME as non-functional or implement it | MEDIUM | 2 weeks |
139-
| 13 | Admin audit logging | MEDIUM | 1 day |
140-
| 14 | Lazy-load i18n translations | LOW | 2 hours |
141-
| 15 | Federation sync retry queue | MEDIUM | 1 week |
142-
| 16 | Content deduplication for site files | LOW | 3 days |
143-
| 17 | Prometheus metrics endpoint | LOW | 1 day |
127+
| 1 | S3/MinIO object storage | CRITICAL | ✅ Fixed — S3StorageProvider, AWS SDK v3 |
128+
| 2 | Drizzle migrations | CRITICAL | ✅ Fixed — 0000_initial_schema.sql + migrate.ts |
129+
| 3 | Redis rate limiting | CRITICAL | ✅ Fixed — shared Redis store, falls back with warning |
130+
| 4 | Unlock cookie security | HIGH | ✅ Fixed — HMAC-signed, timingSafeEqual |
131+
| 5 | Admin RBAC | HIGH | ✅ Fixed — requireAdmin middleware, isAdmin flag |
132+
| 6 | Host router LRU cache | HIGH | ✅ Fixed — 10K domain + 50K file entries |
133+
| 7 | DB pool configuration | MEDIUM | ✅ Fixed — max/min/timeout/error handler |
134+
| 8 | Session expiry cleanup | MEDIUM | ✅ Fixed — 6-hour background job |
135+
| 9 | Analytics bulk delete | MEDIUM | ✅ Fixed — uses inArray() |
136+
| 10 | Health monitor threshold | MEDIUM | ✅ Fixed — N=3 consecutive failures |
137+
| 11 | Replay attack window | MEDIUM | ✅ Fixed — 5-minute timestamp check |
138+
| 12 | i18n async loading | LOW | ✅ Fixed — i18next-http-backend, HTTP-fetched |
139+
| 13 | Federation sync retry | MEDIUM | ✅ Fixed — exponential backoff queue, 10 max attempts |
140+
| 14 | ACME TLS automation | MEDIUM | ❌ Still a stub — use Caddy for now |
141+
| 15 | Admin audit logging | MEDIUM | 📋 Not yet built |
142+
| 16 | Content deduplication | LOW | 📋 Not yet built |
143+
| 17 | Prometheus metrics | LOW | 📋 Not yet built |
144+
| 18 | Gossip in-memory per-instance | LOW | ⚠️ Multi-instance gossip not Redis-shared |
145+
| 19 | Session store (multi-instance) | MEDIUM | ⚠️ PostgreSQL sessions work; not Redis-backed |
144146

145147
---
146148

artifacts/api-server/package.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,11 @@
2323
"express-slow-down": "^3.1.0",
2424
"google-auth-library": "^10.6.2",
2525
"helmet": "^8.1.0",
26+
"ioredis": "^5.6.1",
2627
"openid-client": "^6.8.2",
2728
"pino": "^10.3.1",
2829
"pino-http": "^11.0.0",
30+
"rate-limit-redis": "^4.2.0",
2931
"uuid": "^13.0.0"
3032
},
3133
"devDependencies": {
@@ -36,6 +38,7 @@
3638
"@types/node": "catalog:",
3739
"esbuild": "^0.27.3",
3840
"pino-pretty": "^13.1.3",
39-
"tsx": "catalog:"
41+
"tsx": "catalog:",
42+
"@types/ioredis": "^4.28.10"
4043
}
4144
}

artifacts/api-server/src/index.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ import { generateKeyPair } from "./lib/federation";
55
import { startHealthMonitor } from "./lib/healthMonitor";
66
import { startAnalyticsFlusher, stopAnalyticsFlusher } from "./lib/analyticsFlush";
77
import { startGossipPusher, stopGossipPusher } from "./routes/gossip";
8+
import { getRedisClient, closeRedis } from "./lib/redis";
9+
import { startSyncRetryQueue, stopSyncRetryQueue } from "./lib/syncRetryQueue";
810
import { db, sessionsTable } from "@workspace/db";
911
import { lt } from "drizzle-orm";
1012
import { seedBundledSites } from "./lib/seedBundledSites";
@@ -60,6 +62,8 @@ function gracefulShutdown(server: http.Server, signal: string): void {
6062
try {
6163
stopAnalyticsFlusher();
6264
stopGossipPusher();
65+
stopSyncRetryQueue();
66+
await closeRedis();
6367
const { pool } = await import("@workspace/db");
6468
await pool.end();
6569
logger.info("Database pool closed");
@@ -92,6 +96,13 @@ ensureLocalNode()
9296
startHealthMonitor();
9397
startAnalyticsFlusher();
9498
startGossipPusher();
99+
startSyncRetryQueue();
100+
101+
// Initialise Redis connection (optional — falls back to in-memory if not configured)
102+
const redis = getRedisClient();
103+
if (redis) {
104+
await redis.connect().catch(() => {}); // errors handled via 'error' event
105+
}
95106

96107
// Session expiry cleanup — purge expired sessions every 6 hours
97108
// Prevents unbounded growth of the sessions table
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/**
2+
* Redis client singleton.
3+
*
4+
* Used by:
5+
* - Rate limiter store (express-rate-limit + rate-limit-redis)
6+
* - Future: session store for multi-instance deployments
7+
* - Future: pub/sub for cache invalidation signals
8+
*
9+
* Connection is optional — if REDIS_URL is not set, the app runs in
10+
* single-instance mode with in-memory rate limiting. A warning is logged.
11+
*
12+
* In production with multiple API server instances, REDIS_URL MUST be set
13+
* or rate limiting will be bypassed (each instance has its own counter).
14+
*/
15+
16+
import { Redis } from "ioredis";
17+
import logger from "./logger";
18+
19+
let redisClient: Redis | null = null;
20+
21+
export function getRedisClient(): Redis | null {
22+
if (redisClient) return redisClient;
23+
24+
const url = process.env.REDIS_URL;
25+
if (!url) {
26+
return null;
27+
}
28+
29+
try {
30+
redisClient = new Redis(url, {
31+
maxRetriesPerRequest: 3,
32+
connectTimeout: 5_000,
33+
lazyConnect: true,
34+
enableReadyCheck: true,
35+
});
36+
37+
redisClient.on("connect", () => {
38+
logger.info("[redis] Connected");
39+
});
40+
41+
redisClient.on("error", (err) => {
42+
logger.warn({ err: err.message }, "[redis] Connection error");
43+
});
44+
45+
redisClient.on("close", () => {
46+
logger.warn("[redis] Connection closed");
47+
});
48+
49+
return redisClient;
50+
} catch (err) {
51+
logger.error({ err }, "[redis] Failed to create client");
52+
return null;
53+
}
54+
}
55+
56+
export async function closeRedis(): Promise<void> {
57+
if (redisClient) {
58+
await redisClient.quit();
59+
redisClient = null;
60+
}
61+
}

0 commit comments

Comments
 (0)