All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog 1.1.0, and this project adheres to Semantic Versioning.
1.2.0 - 2026-06-09
Dashboard performance overhaul plus capability-focused security hardening. Cold and warm dashboard loads drop from seconds to sub-second on large services; sustained concurrent load no longer wedges the backend. Read-path I/O is structurally cut by a per-service DuckDB connection pool, a per-minute time-series rollup bundle, size-capped bin-packing local compaction, composite endpoints that collapse multi-card admin pages into one request, and a frontend pre-warm / hover-prefetch pattern that makes navigation feel instant. Security hardening tightens cross-tenant boundaries, closes a ContextVar propagation hole in the s3fs proxy hook, removes a secret-in-URL leak on downloads, and adds strict validation across the destructive-op surface.
Structural:
- Per-minute time-series rollup bundle (
backend/core/rollups.py) precomputes a hour-bundled per-minute aggregate for the dashboard chart, eliminating the wide Iceberg scan on chart render. Generated alongside the existing Top-N rollups. - Per-day compaction tier for rollups — closed days are compacted into per-day parquet files; the reader prefers the per-day file and falls back to hourly only for the current day, cutting file-handle pressure on long-running services.
- Size-capped bin-packing local compaction (backend/core/local_compaction.py) replaces single-file daily/weekly rollups with sequential bin-packing capped at
_MAX_PARTITION_BYTES(default 256 MB). Hourly partitions older than 7 days bin-pack into daily files; daily files older than 30 days bin-pack into weekly files. DuckDB query parallelism is preserved on multi-month services where the prior single-file approach degraded to scan-of-one-huge-file. - DuckDB connection-pool tuning knobs —
DUCKDB_POOL_CONN_MEMORY_LIMITandDUCKDB_POOL_CONN_THREADSenv vars cap per-pool-connection memory and thread count so 8 concurrent queries don't oversubscribe physical cores or balloon RSS. Pool view-binding moved outside theConditionlock to eliminate a deadlock under stale-Iceberg-snapshot reload. - Composite read endpoints collapse multi-card mounts into single requests:
POST /api/scoring/dashboard(8 per-card requests → 1)GET /api/scoring/analyticsandGET /api/scoring/configGET /api/network-healthnow includes shielding analysisPOST /api/origin/aggregates(new) batches the origin page's per-card queries Per-card endpoints stay mounted for back-compat; the frontend opts into composite where it makes sense.
- Parquet ingest sort key changed to
(timestamp, ip)so sessions queries can stream-merge onipinstead of materialising a temp table — ~2× speedup on sessions dashboards. ingested_files.file_datecolumn +(source_name, file_date)index added via numbered SQLite migration. The log-accounting fast path uses the index to bucket by day without scanning every row;metadata_db.get_node_count_avgandget_log_accounting_countssplit on it.- Iceberg commit hygiene — buffer files are tombstoned and removed on the next pass instead of unlinked inline at commit time, removing a commit-path stall.
optimize_tableaddsunion_by_name+ retry-on-CAS-conflict to silence the nightly schema-evolution warning. - Bootstrap stale-while-revalidate —
/api/bootstrapreturns cached dir-stats immediately and refreshes in the background; views are folded into the response so the admin page doesn't issue a follow-up.
Tuning:
- Dashboard live-hour TEMP TABLE shared across CTEs; Python-side bot match + memoised
ngwaf_topcut DuckDB round-trips. - Insights coalesce four city/region/country queries into one and four URL-keyed insights into one CTE (Option C pattern).
- Sessions split the monolithic CTE into measurable stages and eliminate the temp-table materialisation on the hot path.
- Origin summary combines two sequential scans into one via
GROUPING SETS. - Cron-runs
since_iddelta-poll param + frontend wiring on/logs recentCronsso the page only fetches new events. - Admin usage-log visibility-gates its 30s tick and rewrites the latest-per-task SQL to skip the full join.
- Admin shielding banner endpoint trimmed; share-status
staleTimetightened. - Bot-source cache: 60s TTL on the recursive cache-dir
scandir(was 200–1500 ms per/api/bootstrap). - React-Query: skip 4xx retries; hooks lifted out of insights / ReportLayout render-props so each page mount re-uses one query instance instead of re-mounting on every parent render.
Frontend:
starlette-compressreplacesGZipMiddleware— backend now negotiatesbr/zstd/gzip(was gzip-only). Modern browsers get brotli; rendered-text payloads drop ~25 % on the wire.- Keep-alive on Next.js http/undici global agents so the proxy reuses TCP connections to the FastAPI backend instead of new-handshake-per-request.
- Pre-warm + lazy-mount pattern — plotly + maplibre-gl +
world.geojsonare pre-warmed onAppLayoutmount via hidden one-point charts; the visible chart hydrates from the warm module cache instead of triggering a fresh import on first render.LazyMount+PlotlyChartstartvisible=falseto avoid the hydration-mismatch warning that came with the prior eager-mount pattern. - Hover-prefetch sidebar links so the destination's data warms before the click commits.
- Per-insight skeleton cards on first paint; full skeleton rendered from
CARD_CATEGORIESon the dashboard. - Modulepreload for the plotly chunk via a build-time-generated preload manifest (
scripts/build-preload-manifest.mjs+lib/preload-manifest.ts); restores plotly's preload without re-introducing the nav-lag the first attempt caused. - Drop
force-dynamicon routes that don't need it; root layout opts out of build-time SSG so the preload manifest is read at request time. /geo/*static assets cached aggressively;PlotlyChartdynamic-import on/network.SystemHealthCardpolling moved to 1 s for live attack/load feedback now that the endpoint is cheap.useNowMsreuse — multiple visible-tick components (countdowns, "X seconds ago") share one interval.- Map style-data listener replaces a 100 ms
setTimeoutpoll.
- Multi-worker login loop fixed —
tunnel.pynow rehydrates a share session on-demand from SQLite when an in-memory cache miss happens on a different uvicorn worker. Previously, login on worker A would loop because worker B couldn't see the freshly-minted session. - DuckDB lock conflict resolved between the connection pool and cron writes —
get_connectionforcesread_only=Falseso pool readers and cron writers no longer trip DuckDB's "different configuration" error on the same file. - Stale-view self-heal —
QueryRunnerclears_view_cachebefore theforce=Truerebuild on the post-empty recovery path so the next query doesn't see the stale schema. - Iceberg s3fs proxy hook falls back to the process-global source so the hook always registers, even when the ContextVar is empty (e.g. cold-start LIST before any
_get_cataloghas fired). - Top-N current-hour merge — a silent
ImportErrorwas dropping the current-hour merge; restored with an explicit fail-loud import. - Rollup compaction —
run_idthreaded through the error branch and the compaction step now uses an in-memory DuckDB so a corrupted on-disk catalog can't wedge the cron. - Dashboard response cache — write to
is_cached(not the aliased_is_cached) so Pydantic doesn't drop the flag on serialise. - Dashboard cache hit rate — disabled the 30 s response-level cache that was masking the rollup wins for fast-changing queries.
- Usage-log rollup drift — reconcile cycle changed from DELETE+INSERT to UPSERT so concurrent flushes can't lose rows.
- Botnet insight investigate link filters only the queried column, not all of them.
expire_snapshotsupdated for pyiceberg 0.11.1 API and now emitscron_runstelemetry.- Proxy compatibility — switched from
middleware.tstoproxy.tsfor Next.js 16; restored the Caddy-marker middleware that the upgrade broke. - Telemetry response middleware backstop (backend/utils/telemetry_response_middleware.py) auto-injects
_debug_queries/_debug_calls/_is_cachedinto JSON-dict responses that bypassedBaseResponse.with_telemetry, so newly-added endpoints don't silently blank the Debug Panel.
Capability-focused hardening across the backend and frontend trust boundaries.
- Cross-tenant ContextVar leak in the s3fs proxy hook closed. PyIceberg writes parquet via a
ThreadPoolExecutor; ContextVars don't propagate to executor workers by default, so the prior fix used an endpoint-keyed global registry that was vulnerable to overwrite when two tenants shared an endpoint URL. Replaced with a globalThreadPoolExecutor.submitmonkeypatch that wraps the callable incontextvars.copy_context()— matches asyncio'sloop.run_in_executorsemantics. Documented in MONKEYPATCHES.md §6. - Path-param service-scope desync — analyst sessions could supply a
service_idpath param that didn't match their session scope on a handful of mutation endpoints. Centralised the check via a router-utils helper invoked on every scoped route. - Secret-in-URL leak on downloads — the download endpoint previously embedded the shared CDN secret in the redirect URL where it could land in browser history / referrer headers. Switched to a signed short-lived bearer that's stripped before the redirect.
- Strict input validation on the destructive-op surface — provision teardown, NGWAF workspace mutations, scoring threshold + enforce-status-code + recv-exclusion-regex changes — runs through length caps, character allowlists, and (where applicable)
falcostatic analysis before any VCL ships. - CSRF gates — moved GET→POST on
logging-settings/updateand sibling state-changing endpoints that were addressable via GET. - Authorisation tightening — share-admin endpoints reject the Caddy-marker header from non-Caddy paths;
claim_tokenpath consolidated under a single atomic UPDATE so concurrent claims can't both succeed. - Cross-tenant cache audit — re-verified that every per-tenant cache key includes
service_id; closed two missing entries on insights and origin paths. - Thread leak fix — the share-login flow was leaking a daemon thread per failed login on multi-worker setups; the new on-demand SQLite rehydration replaces the thread entirely.
- Terms-of-service bypass — share-login
/acknowledgenow fetches the active TOS version and refuses acknowledgement of a stale one; frontend was sending a hardcoded version. - Telemetry-proxy diagnostics for silent 400s (
Missing X-Fos-Target) and unclassifiedlist_objects_v2calls; preserveContent-Typeso downstream compression always fires; preserve multi-valued response headers.
- 3500+ backend tests (+450).
- 290+ frontend vitest tests (+25).
- New coverage:
tests/core/test_duckdb_pool.py,test_local_compaction.py,test_rollups_compaction.py,test_rollups_hour_bundling.py,test_iceberg_helpers.py,tests/services/test_service_manager.py,tests/utils/test_sql_validator.py,test_telemetry_response_middleware.py,test_router_utils.py,test_state_sync.py,test_terraform_gen.py, plus router coverage for the new composite endpoints and the destructive-op-auth surface. make cigreen: lint + format + mypy + pytest + vcl-test + verify-deps + typecheck-frontend + test-frontend + osv + secret-scan.
- Synthetic load generator (scripts/loadtest_generator.py) and read-path probe (scripts/dev/loadtest_probe.sh) for reproducible perf measurement against local Parquet+Iceberg.
- Two-pass next build in the frontend Dockerfile so SSG sees the correct plotly chunk hashes; preload-manifest scanner runs after
next buildto capture them.
AGENTS.md— added Key Systems entries for the DuckDB connection pool, the hourly Top-N rollup pipeline, and the response telemetry middleware. Updated the local-compaction section to reflect the bin-packing tiers.MONKEYPATCHES.md— documents the newThreadPoolExecutor.submitpatch.
1.1.0 - 2026-06-03
Edge session scoring. Every request is classified in real-time at the edge by a Fastly Compute service that runs an L1 (cookie compliance + timing rules) + L2 (PageRank-trained transition matrix) scorer, returning a combined 0-100 score that lands in DuckDB for analyst review. Operators can label sessions, watch live ROC-AUC, retrain the matrix, roll back to a prior matrix, rotate the AES cookie key, and push a hard enforcement threshold that rejects flagged requests at the edge with an operator-chosen HTTP status code (default 429).
- Edge scoring — Fastly Compute scorer + 6-snippet VCL preflight pattern (recv/pass/fetch/deliver/miss/enforce), AES-GCM-encrypted session cookie carrying rotating sid + transition state,
fastly.ddos_detectedgate so Compute is bypassed under L7 attack. - Admin UI at
/admin/session-scoring— StatusPanel with live AUC against accumulated labels, ScoringHealthCard with fire rate / score distribution / top reasons / matrix-staleness alert, ThresholdSlider with counterfactual flag/pass preview + precision/recall + commit-threshold persistence, RocPrCurves with ROC + Precision-Recall plots, TopFlaggedTable + LabelsTab with click-to-view-events per sid, RetrainButton (DuckDB traces → train.py → publish matrix to FOS), SinceHoursPicker driving all six cards on one shared time window. - Labels CRUD — POST/PATCH/DELETE per-sid labels (good/bad/neutral) feed
evaluate_from_persisted_scoresto compute live ROC-AUC. Min-samples gate (≥3 per class) prevents noisy display. - ROC + PR curves + per-reason AUC breakdown (split by L1/L2 rule: cookie-missing, impossibly-fast, robotic-consistency, rare-transition, low-transition-prob).
- Composite
/scoring/dashboardendpoint collapses the 8 per-card requests into one in-flight-collapsed payload; the existing per-card endpoints stay mounted for back-compat. edge_score_reasonvirtual field — CSV-split via DuckDBunnest(string_split(...)), top-N cards + click-to-filter same as NGWAF signals.- FOS matrix persistence —
enable_scoringpublishes the trained matrix to FOS; backend auto-fetches on startup (no more per-host scp). - Matrix version history + rollback — every publish snapshots the prior matrix to
iceberg/meta/scoring_matrix_history/{version}.json; new/scoring/matrix-versionslists them and/scoring/matrix-versions/{v}/restore?confirm=truecopies a historical matrix back. AUC reflects the rollback immediately; Wasm at edge keeps the embedded matrix untildeploy_wasm.shre-runs (deploy_hint surfaced). - Threshold enforcement (live blocking) — operator commits a threshold, scorer reads it from
scoring_configConfigStore, emitsX-Edge-Score-Enforce: 1when score≥threshold, the newSession Scoring - EnforceVCL snippet rejects those requests on the post-scoring restart. Effective at the edge within seconds. Confirm-dialog-gated PUT endpoint + LIVE warning chip in the slider UI. The response code defaults to 429 (Too Many Requests) and is operator-overridable per-service via a newEnforce response codeselector (403 / 429 / 451 / 503; backend accepts any 4xx/5xx) — picks land via a focusedupdate_enforce_status_codeorchestrator that swaps only the enforce snippet (~5–10s end-to-end vs. the full enable_scoring flow). Audit-logged asscoring_enforce_status_code_changed. - URL exclusion regex override — operator-tunable per-service regex for "which URLs bypass the scorer". Defaults to the built-in static-asset extension list; the new
ExcludeRegexCardon the Session Scoring page accepts a custom regex (e.g. exclude/healthz, exclude entire path prefixes, scope scoring to specific traffic). The PUT endpoint validates input through three layers before any VCL ships: (1) input policy — length cap, no quote / control chars, must compile under Python'sre; (2) falco static analysis on the assembled recv snippet (catches regex+VCL composition errors that slip past Python's compiler); (3) Fastly's own VCL compiler at activate time. A focusedupdate_recv_exclusion_regexorchestrator clones the active version, swaps only the recv snippet, and activates — ~5–15s end-to-end vs. the full enable_scoring flow. Confirm-dialog-gated. Audit-logged asscoring_exclude_regex_changed. Falco shipped in the backend Docker image; production setsSCORING_REQUIRE_FALCO=1so a missing binary fails closed instead of degrading to input-policy-only. - AES key rotation —
POST /scoring/rotate-keymints a fresh 32-byte key, moves the prior toprevious_key_hex(grace slot — Rust cookie codec falls back to it so in-flight cookies keep decoding through one rotation cycle). - Cookie lifecycle bounds —
SESSION_IDLE_EXPIRE_S(30 min) +SESSION_HARD_CAP_S(24h) in the Rust scorer mint a fresh sid when either threshold is exceeded. Stolen cookies can't replay beyond their window; long-running sessions stop biasing the L1 variance estimator. - Per-reason AUC breakdown UI —
PerReasonAucCardrenders AUC split by which L1/L2 rule fired (cookie-missing, impossibly-fast, robotic-consistency, rare-transition, low-transition-prob). - Operator audit log — new
scoring_audittable +/scoring/auditendpoint records every scoring_enabled, scoring_disabled, threshold_committed/cleared/enforced, matrix_retrained/restored, key_rotated event with actor + timestamp + details. Per-host, never mirrored via state_sync.
- Cron-progress reliability —
end_progressauto-emitsdonewhen the last event isn't terminal;list_active_runstriple-guards (last-event filter + 5-min staleness + DB-status cross-check viaget_cron_run_status);reap_zombie_runscalled from every cron-tick cleanup. Fixed a production incident where 382 stale "sync" entries piled up on the System Health card. - state_sync merge guards —
import_admin_stateno longer overwrites scoringcustom_fieldswith stale FOS payloads (root cause of a production data-loss incident); sibling fixes incli.handle_update_logs,provision.write_service_config, andapi_service_log_fields_setclose every "remote-overwrites-code-managed-state" path. - Defense-in-depth —
enable_scoringrollback +disable_scoringfinal-save reload cfg right before writing to close the 30-120s race window where concurrent writers got clobbered. - Per-key in-flight collapse in
_cachedso the dashboard's 8-card mount no longer queues queries behind one global lock.
Structural:
- DuckDB connection pool (
backend/core/duckdb_pool.py) replaces per-request connection setup; eliminates the per-request DuckDB initialisation cost on hot paths. - Hourly Top-N rollup pipeline (
backend/core/rollups.py+scripts/backfill_rollups.py) precomputes the dashboard's most-asked aggregates; cold-load dashboard scans drop from seconds to tens of ms. - Bounded cache primitive (
backend/utils/bounded_cache.py, 13-testtests/utils/test_bounded_cache.py) replaces several previously-unbounded dict caches across the request path (also referenced under Security →_StaticAssetLimiterand the analytics cache insession_scoring._cached).
Tuning:
security/top-botsconsolidated UA + NGWAF onto one temp table (was 2 independent Iceberg scans per dashboard mount).dashboard/rawusesget_source_extentfor cached steady-state extent.usage/prefillcached-status fast path skips DuckDB hop when the sync cron has populated it.get_enriched_services60s TTL cache on the recursive cache-dirscandir(was 200-1500ms per/api/bootstrap).loading.tsxSuspense skeletons + dynamic imports (LabelsTab, ChoroplethMap) cut admin-page click lag.
- Dropped dead
@daypicker/reactdep + deadfrontend/components/ui/calendar.tsx. - Collapsed 7-site
cleanup_progress + reapboilerplate intocleanup_progress_and_reap()helper. - Refactored
security.py's ad-hoc temp-table to use the existingQueryRunner.temp_table()context manager. - Narrowed
get_cron_run_statusexception scope tosqlite3.Errorwith DEBUG log so future triage isn't flying blind.
Capability-focused hardening across the FastAPI backend, Fastly VCL, Next.js frontend, and Rust scorer. All changes deployed and verified.
- Trust-boundary normalisation:
- uvicorn runs with
--proxy-headers --forwarded-allow-ips=127.0.0.1sorequest.client.hostis the real client IP via Caddy's authoritative XFF rewrite. is_request_remote()readsrequest.client.hostinstead of the forgeable Host header; in-app leftmost-XFF parsing is gone.- Caddyfile gates
Fastly-Client-IP → X-Forwarded-Forrewrite onremote_ipmatching Fastly edge ranges. Startup assertion onTRUSTED_PROXY_IPS/UVICORN_FORWARDED_ALLOW_IPS+ integration test prevent silent regression. - Next.js
/adminmiddleware gates on the Caddy-injectedX-Proxied-By-Caddy: truemarker instead of the forgeable Host header.
- uvicorn runs with
- Destructive-op auth:
/api/provision/teardownvalidates a caller-supplied Fastly token via/tokens/selffor theglobalscope before any destructive op; never falls back to server-stored credentials. Frontend TeardownDialog prompts admin for the token./api/provision/ngwaf-workspacestoken-gated (constant-time stored-key match OR validatedglobal-scope token); NGWAF workspace mutation enforces analyst-session scope.
- DuckDB user-SQL safety:
- New
backend/utils/sql_validator.pyenforces a statement-type whitelist + recursive parse-tree walker with catalog blocklist (duckdb_*/pg_*prefixes,information_schema/pg_catalog/systemschemas, non-maincatalogs) + function denylist (read_csv/read_parquet/iceberg_scan/glob/lsdir/getenv/current_setting/duckdb_secrets/ postgres / sqlite / mysql scanners) + fail-closed parse + audit logging + perf budget. Replaces a regex-based blocklist that missedread_csv_auto,information_schema,duckdb_secrets,INSTALL/LOAD, andgetenv. escape_sql_literalhelper applied at four ingest call sites; characterisation tests cover the PoC payload + multi-byte UTF-8 + backslash + empty + long-with-many-quotes.time_rangevalidated viadateutil.isoparsebefore SQL interpolation.get_con/get_meta_condropped the auto-query-paramread_onlyflag.
- New
- VCL header & cache discipline:
vcl_recvpreamble unsets every internalx-of-*/x-fos-edge-data/x-is-cluster-fetch/X-Edge-*header on the inbound request.- Origin-metric VCL fields: numeric regex gates +
json.escapeon string values (log-injection). - VCL ua/referer keeps its
substrcap. - Fastly
vcl_hashnow keys on the fullreq.url(path + query), not justreq.url.path— closes cross-query cache poisoning. Authkeyquerystring is already stripped earlier so no secrets leak into cache keys.
- Cross-tenant scope enforcement:
/api/alerts/*and/api/views/*enforce analyst-session scope on every read and mutation; pre-flight scope check on PATCH / DELETE via newget_alert_by_id/get_view_by_idhelpers so unauthorised mutations never land./api/sources,/api/log-fields/catalog, NGWAF workspace listing — analyst-scope filtering.- Cache-layer audit confirmed every per-tenant cache (
session_scoring._cached, iceberg, bot_sources) includesservice_idin the key.
- Path-traversal cages:
/api/downloadpath traversal:realpath+commonpathcage.- Cache cleanup rejects bucket separators +
realpathcage. service_idalphanumeric/dash/underscore validation in path helpers.
- Secret & data hygiene:
claim_tokenTOCTOU → atomic UPDATE with rowcount check.share_dbquarantine narrowed to actual SQLite corruption signatures (was wiping the DB on transientOperationalError).- Email-enumeration timing equalised via dummy scrypt on miss.
validate_sessionre-syncspii_policy/ window /service_idson every call so admin permission edits take effect immediately._StaticAssetLimiterbounded at 10 k tracked IPs.logging-settings/updatemoved GET → POST/PATCH (CSRF).query_errorsdecorator logs traceback server-side, never in the response body; sweep fixture asserts notracekey leaks from any route.
- SSH host-key pinning:
configs/ssh_known_hostspinned, source-controlled, and gitignore-excepted; tunnel manager refuses to start when the file is missing (fail-safe; no TOFU fallback). - Scorer signal tightening: Python + Rust parity —
L1_SCORE_COOKIE_TAMPERED = 100(was capped at 75 with missing/expired);L1_ROBOTIC_DWELL_LOW_S 0.5 → 0.20(closes the 0.20s–0.50s robotic-bot threshold gap). Tracked follow-up sliding-window mean (needs cookie-schema v3) — partial mitigations viaSESSION_IDLE_EXPIRE_S=30 min+SESSION_HARD_CAP_S=24h+ session-max scoring bound the practical attack window.
- 3070 backend tests
- 65 scorer Rust tests (+8)
- 265 frontend vitest tests (+13)
make cigreen: lint + format + mypy + pytest + vcl-test + verify-deps + typecheck-frontend + test-frontend + osv.
- Backend Docker image:
python:3.12-slim-bullseye→python:3.12-slim-bookworm(cuts CVE-laden Debian 11 base; remaining 13 high CVEs are deep-dependency / OpenSSL CVEs every major Python base inherits). Frontend image's api-schema stage bumped to match. - Backend image now ships
falcov2.3.0 (Fastly VCL static analyser) — required by the scoring-recv-snippet validator. - Secret scanning —
gitleaksv8.30.1 wired in three places:.pre-commit-config.yaml(blocks accidentally-staged credentials at commit time),make secret-scanMakefile target chained intomake ci, and a dedicated step in.github/workflows/ci.yml(fails the build on any non-allowlisted finding). Configuration in.gitleaks.tomlextends the built-in ruleset and adds path allowlists for tracked test fixtures, Rust lockfile checksums, the public SSH host key, and (for working-tree-only scans) the gitignored real-config /.next//data/system/directories. Verified clean against the full branch history. Policy + suppression playbook documented in AGENTS.md §Secrets. - CDN cache-key hardening —
backend/core/fastly/utils.pyvcl_recvnow runsquerystring.filter_exceptto drop all non-S3-API query parameters (caller-injected tracking params, marketing UTMs, session IDs) BEFORE the cache lookup, followed byquerystring.sortto canonicalise the remaining param order. Composes with thevcl_hashfix: untrusted params can no longer fracture the cache OR leak the authkeyinto the cache key. - Dependency freshness sweep on all four ecosystems:
- Python:
aiohttp 3.13.5 → 3.14.0,cfn-lint 1.51.2 → 1.51.4,distlib 0.4.0 → 0.4.1,filelock 3.29.0 → 3.29.1,idna 3.17 → 3.18,joserfc 1.6.8 → 1.7.0. - Frontend:
@tanstack/react-query 5.100.14 → 5.101.0(+ devtools),@types/react 19.2.15 → 19.2.16,react/react-domresolved to19.2.7via the existing^19.2.5range.next+eslint-config-nextstay pinned at16.2.6. - Rust:
bitflags 2.11.1 → 2.12.1. - Deferred (major bumps reserved for 1.2): TypeScript 5.9 → 6.0 (compiler-API breaking changes); Fastly Rust SDK 0.11 → 0.12 (Compute@Edge API changes); jsdom / eslint / vitest where we're already ahead of the npm "latest" tag.
- Python:
- Rate limiting at the edge is NOT included. The DDoS gate (
fastly.ddos_detected) handles attack-scale traffic by bypassing Compute; sustained-low-rate abuse is left to the operator's existing WAF/NGWAF policies. A future rate-limiting feature is tracked separately. - When a matrix is rolled back via the UI, the edge Wasm continues to use its embedded matrix until
scripts/scoring/deploy_wasm.shre-runs. The Restore endpoint returns adeploy_hintwith the exact command. Seedocs/session_scoring_runbook.md.
1.0.0 - 2026-06-01
Initial public release. Self-hosted dashboard for searching, filtering, and visualizing request-level Fastly logs streamed to Fastly Object Storage.
- Apache Iceberg data lake in Fastly Object Storage — ACID-compliant log storage, safe for concurrent readers and writers, with automated compaction and snapshot expiration.
- Automated provisioning — guided wizard (and equivalent
backend/provision.pyCLI) creates the FOS bucket, scoped access key, CDN-fronting Fastly Delivery service, and the logging endpoint on your VCL service. Auto-rollback on failure. - Crash-safe ingestion — buffered locally, atomically committed; interrupted imports never corrupt the table.
- CDN-accelerated reads — every FOS data read goes through a Fastly Delivery service for free egress and edge caching.
- Multi-source support — analyze logs from multiple Fastly services side by side, each with its own DuckDB engine and Iceberg table.
- Interactive dashboards — traffic over time, global request map, top-N aggregations across every dimension, paginated raw-log viewer with click-to-filter.
- Insights — automated anomaly detection for error spikes, regional traffic surges, new IPs, WAF signal changes, cache efficiency collapses, and latency regressions.
- Usage & Cost — live storage breakdown, FOS Class A / B operation counts, period totals, and an interactive cost estimator pre-filled from your traffic stats.
- Log-line accounting — reconciles Fastly's authoritative
/stats/service/{id}counter against locally-ingested rows bucket-by-bucket and surfaces sustained pipeline loss. - Configurable log fields — thirteen built-in field groups (HTTP, network, geo, TLS, NGWAF, QUIC/HTTP3, origin metrics, etc.) plus arbitrary custom VCL fields with auto-generated Edge Data Capture snippets.
- Alerts — threshold-based, webhook-delivered, with optional comparison-period evaluation and per-status-code scope.
- Two collaboration modes — invite analysts to run an independent copy (durable JSON-config join with read-only FOS credentials), or share your running instance live via three sharing modes: SSH reverse tunnel via localhost.run, your own hostname, or your own public IP. Per-analyst passcode invites, optional IP allowlist, optional expiry, and instant single-invite or sever-all revoke. Per-mode trust-model trade-offs are documented in SECURITY.md.
- Field-size guard — warns when your selected log fields approach Fastly's ~8 KB log-format limit.
See docs/features.md for the full feature reference.