|
1 | 1 | # Prometheus + Grafana (local observability) |
2 | 2 |
|
3 | | -Docker Compose runs **Prometheus** (scrapes **spectral-mesh** and **spectral-edge** on the host) and **Grafana** (dashboards provisioned from this folder). The **Spectral mesh** dashboard assumes a **Linux** host sensor with live TLS capture; **Windows/macOS** builds expose the same metric names but ringbuf/uprobe series may stay at zero until capture is integrated ([`README.md`](../README.md#platforms)). |
| 3 | +Docker Compose runs **Prometheus** (scrapes **spectral-mesh** and **spectral-edge** on the host) and **Grafana** (dashboards provisioned from this folder). The **Spectral mesh** dashboard’s **cleartext chunk** panels track **`spectral_mesh_ringbuf_events_total`**, which increments on **every** `HandleChunk` — **Linux eBPF** or **pluggable feeds** (ingest, hook socket, stdin JSONL). **Ringbuf drop/read-error** panels are **Linux BPF–path** metrics and often stay near zero on **Windows/macOS** or **ingest-only** tests ([`README.md`](../README.md#platforms)). |
4 | 4 |
|
5 | 5 | ## Ports |
6 | 6 |
|
@@ -44,7 +44,19 @@ Example **Prometheus alert rules**: [`prometheus/spectral-edge-rules.yml`](prome |
44 | 44 |
|
45 | 45 | If you use **`-metrics-listen 127.0.0.1:9092`**, add a second scrape job (or change targets) so Prometheus hits **`:9092`** for **`/metrics`**, not **`-listen`**. |
46 | 46 |
|
47 | | - For a quick smoke test without the full mesh, drive edge only (see **[`scripts/simulate_edge.sh`](../scripts/simulate_edge.sh)**). Optional load: **[`scripts/load_edge_smoke.sh`](../scripts/load_edge_smoke.sh)**. For **Grafana** time series, **[`scripts/simulate_edge_grafana.sh`](../scripts/simulate_edge_grafana.sh)** sends **proxy-path** traffic (not only **`/v1/scan`**) so **`spectral_edge_http_requests_total`** updates. |
| 47 | + **Simulation scripts** (repo root `scripts/`): |
| 48 | + |
| 49 | + | Script | Purpose | |
| 50 | + |--------|---------| |
| 51 | + | [`simulate_mesh.sh`](../scripts/simulate_mesh.sh) | Linux **eBPF** path: HTTPS via **curl** → OpenSSL `SSL_write` | |
| 52 | + | [`simulate_mesh_ingest.sh`](../scripts/simulate_mesh_ingest.sh) | **POST `/v1/ingest/chunk`** (any OS); bumps same chunk counter as hooks | |
| 53 | + | [`simulate_mesh_grafana.sh`](../scripts/simulate_mesh_grafana.sh) | Ingest burst + optional **split-chunk** demo for **Grafana** | |
| 54 | + | [`simulate_capture_demo.sh`](../scripts/simulate_capture_demo.sh) | **`go run ./cmd/spectral-capture-demo`** wrapper | |
| 55 | + | [`simulate_edge.sh`](../scripts/simulate_edge.sh) | Edge **proxy-style** POSTs | |
| 56 | + | [`simulate_edge_scan.sh`](../scripts/simulate_edge_scan.sh) | Edge **`/v1/scan`** only (**handler=scan** latency) | |
| 57 | + | [`simulate_edge_grafana.sh`](../scripts/simulate_edge_grafana.sh) | Mixed proxy + scan load for dashboards | |
| 58 | + |
| 59 | + Optional: **[`scripts/load_edge_smoke.sh`](../scripts/load_edge_smoke.sh)**. Mesh **ingest** example: `spectral-mesh -metrics-addr :9090 -capture-ingest-addr 127.0.0.1:9091` then **`simulate_mesh_ingest.sh`** or **`simulate_mesh_grafana.sh`**. |
48 | 60 |
|
49 | 61 | If you use another port, TLS, or a **GHCR** image from **[`release-edge`](../.github/workflows/release-edge.yml)**, edit **[`prometheus/prometheus.yml`](prometheus/prometheus.yml)** under the `spectral-edge` job (`targets`, `scheme`, `tls_config`, etc.). |
50 | 62 |
|
@@ -79,9 +91,9 @@ If scraping stays **DOWN**: |
79 | 91 |
|
80 | 92 | [`grafana/dashboards/spectral-mesh.json`](grafana/dashboards/spectral-mesh.json) (host sensor) and [`grafana/dashboards/spectral-edge.json`](grafana/dashboards/spectral-edge.json) (edge, including latency panels) are provisioned automatically. Re-import manually after edits: **Dashboards → New → Import → Upload JSON**. |
81 | 93 |
|
82 | | -The **Spectral mesh** dashboard includes **Job** and **Instance** template variables (multi-select, **All** = `.*`), row groupings (**Health**, **TLS capture & ringbuf**, **Uprobes & policy**, **BPF maps & rolling buffer**), a **health** stat row (aggregated error/drop rates and chunk throughput), **dashboard links** to **Spectral edge**, **Prometheus annotation** queries for firing **`SpectralMesh*`** alerts, **5m rates** on ringbuf drops (plus cumulative on the same chart), and **1h increase** helpers on policy reload alongside 5m rates. |
| 94 | +The **Spectral mesh** dashboard includes **Job** and **Instance** template variables (multi-select, **All** = `.*`), row groupings (**Health**, **Cleartext ingress & chunk throughput**, **Uprobes & policy**, **BPF maps & rolling buffer**, **Metric semantics**), a **health** stat row (aggregated error/drop rates and chunk throughput), **dashboard links** to **Spectral edge**, **Prometheus annotation** queries for firing **`SpectralMesh*`** alerts, **5m rates** on ringbuf drops (plus cumulative on the same chart), **1h increase** helpers on policy reload alongside 5m rates, and a **Capture paths vs counters** note (ingest/hook vs BPF-only panels). |
83 | 95 |
|
84 | | -**Edge metrics** include counters (requests, alerts, truncations, allowlist/dedupe/rate-limit suppressions, policy reloads, upstream errors, gzip decode, generated **X-Request-Id**, response-scan alerts when enabled) and histograms (request duration by handler, policy scan duration, upstream round-trip). The **Spectral edge** Grafana dashboard charts throughput, **optional features** (gunzip, request IDs, response policy alerts), p50/p95 latency, suppression rates, upstream errors, and SIGHUP reload activity. See **[`docs/EDGE.md`](../docs/EDGE.md)**. |
| 96 | +**Edge metrics** include counters (requests, alerts, truncations, **HTTP 429** rate-limit path, allowlist/dedupe/rate-limit suppressions, policy reloads, upstream errors, gzip decode, generated **X-Request-Id**, response-scan alerts when enabled) and histograms (request duration by handler, policy scan duration, upstream round-trip). The **Spectral edge** Grafana dashboard adds **Job** / **Instance** variables, charts throughput (including **HTTP rate limited /s** when **`-http-ratelimit-rps`** is used), **optional features** (gunzip, request IDs, response policy alerts), p50/p95 latency, suppression rates, upstream errors, SIGHUP reload activity, and a **Simulation scripts** note. See **[`docs/EDGE.md`](../docs/EDGE.md)**. |
85 | 97 |
|
86 | 98 | ## Stopping |
87 | 99 |
|
|
0 commit comments