Problem
testing.html renders rich live charts (Chart.js + vis-timeline) over the SSE event stream from go-proxy, but events are ephemeral — once a session ends, the data is gone. We have no way to:
- Replay a past session through the same visualizer.
- Run cross-session analytics (e.g. "compare buffer-depth distributions across all sessions with
transient-shock failure injection," "p95 startup time over the last 30 days," "which event sequences correlate with rebuffer events").
- Do ad-hoc SQL exploration over the event corpus.
We want this without adding storage / archival code paths to go-proxy or other existing services. The streaming app stays as it is; analytics is a sidecar.
Proposal
Stand up an analytics tier alongside the existing stack:
- ClickHouse (single-node, Docker) — columnar store for events. One wide
events table keyed by session_id + ts, typed columns for hot-path fields (event_type, bitrate, buffer_depth, fps, dropped, etc.), a JSON column for the long tail.
- SSE→ClickHouse forwarder — small standalone process (Go or Python, ~50–100 lines) that subscribes to
/api/sessions/stream and batch-inserts into ClickHouse. Not part of go-proxy. If it dies, live charts keep working; we just lose archival until restart.
- Grafana with the official ClickHouse datasource — ad-hoc dashboards, cross-session aggregates, the "Splunk-equivalent" exploration surface.
testing.html historical mode — a ?session=<id>&replay=1 mode that fetches the event array from ClickHouse over HTTP (SELECT … FORMAT JSON) instead of subscribing to SSE. Same Chart.js / vis-timeline renderer; only the feeder swaps. Add a scrubber, hide the live/pause toggle.
Why ClickHouse over alternatives
| Option |
Verdict |
| Loki + Grafana |
Strong on logs, weak on cross-session math. Skip. |
| OpenSearch / ELK |
Heavier ops, DSL instead of SQL. Skip. |
| TimescaleDB |
Fine but less efficient for wide event rows + columnar scans. |
| DuckDB + Parquet |
Tempting (zero servers) but batch-only — doesn't fit "query yesterday and right now in the same query." |
| ClickHouse |
Real-time ingest, columnar SQL at scale, single-node trivial in Docker, first-class Grafana plugin. ✅ |
Acceptance criteria
Out of scope
- Replacing the live-mode SSE path. Live stays SSE-direct.
- Multi-tenant access controls on Grafana.
- Real-time alerting on event patterns (separate issue if we want it).
Open questions
- Forwarder language: Go (matches stack) vs Python (faster to prototype). Lean Go.
- Schema: how aggressively to flatten event payloads vs lean on ClickHouse JSON type.
- Where forwarder runs: same container as
go-proxy (sidecar) vs its own service. Lean own service.
Problem
testing.htmlrenders rich live charts (Chart.js + vis-timeline) over the SSE event stream fromgo-proxy, but events are ephemeral — once a session ends, the data is gone. We have no way to:transient-shockfailure injection," "p95 startup time over the last 30 days," "which event sequences correlate with rebuffer events").We want this without adding storage / archival code paths to
go-proxyor other existing services. The streaming app stays as it is; analytics is a sidecar.Proposal
Stand up an analytics tier alongside the existing stack:
eventstable keyed bysession_id+ts, typed columns for hot-path fields (event_type, bitrate, buffer_depth, fps, dropped, etc.), a JSON column for the long tail./api/sessions/streamand batch-inserts into ClickHouse. Not part ofgo-proxy. If it dies, live charts keep working; we just lose archival until restart.testing.htmlhistorical mode — a?session=<id>&replay=1mode that fetches the event array from ClickHouse over HTTP (SELECT … FORMAT JSON) instead of subscribing to SSE. Same Chart.js / vis-timeline renderer; only the feeder swaps. Add a scrubber, hide the live/pause toggle.Why ClickHouse over alternatives
Acceptance criteria
docker-compose.yml(and k3s manifests).eventsschema designed; migrations applied at startup.testing.htmlsupportsreplay=1&session=<id>: fetches events from ClickHouse, renders through the existing Chart.js / vis-timeline code, scrubber for time range.go-proxy,go-live, orgo-uploadbeyond what's needed to expose the SSE stream (already exposed).Out of scope
Open questions
go-proxy(sidecar) vs its own service. Lean own service.