@@ -13,24 +13,25 @@ SpiceBench (OTel instruments)
1313
1414## Metric Checklist
1515
16- | # | Metric | OTel Instrument | Source | Emitted to telemetry | Status |
17- | --- | ------------------------------------ | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | ------------------------ | --------------- |
18- | 1 | ** Data Size** (total bytes ingested) | ` ingestion_bytes_total ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` ingestion.bytes_ingested ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
19- | 2 | ** Ingestion records/s** | ` ingestion_rows_per_sec ` (Gauge\< f64\> ) | SUT adapter ` metrics ` → ` ingestion.rows_per_sec ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
20- | 3 | ** Ingestion rows total** | ` ingestion_rows_total ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` ingestion.rows_ingested ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
21- | 4 | ** Connections / Clients** | ` active_connections ` (Gauge\< u64\> ) | CLI ` --concurrency ` + SUT adapter ` metrics ` → ` ingestion.active_connections ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
22- | 5 | ** Queries/s, Requests/s** | ` queries_per_sec ` (Gauge\< f64\> ), ` queries_total ` (Counter\< u64\> ) | Computed from total iterations / test duration | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
23- | 6 | ** Query Latency (p50)** | ` median_duration_ms ` (Gauge\< u64\> ) | Query driver per-query statistics | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
24- | 7 | ** Query Latency (p99)** | ` p99_duration_ms ` (Gauge\< u64\> ) | Query driver per-query statistics | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
25- | 8 | ** Efficiency (cores)** | ` efficiency_queries_per_core ` (Gauge\< f64\> ) | Computed: ` queries_per_sec / cpu_cores ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
26- | 9 | ** Resource Usage – CPU** | ` sut_cpu_usage_percent ` (Gauge\< f64\> ) | SUT adapter ` metrics ` → ` resource.cpu_usage_percent ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
27- | 10 | ** Resource Usage – Memory** | ` peak_memory_usage_mb ` / ` median_memory_usage_mb ` (Gauge\< f64\> ), ` sut_memory_usage_bytes ` (Gauge\< u64\> ) | Local process via ` sysinfo ` + SUT adapter ` metrics ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
28- | 11 | ** Resource Usage – Disk** | ` sut_disk_read_bytes ` / ` sut_disk_write_bytes ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` resource.disk_read_bytes ` / ` disk_write_bytes ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
29- | 12 | ** Resource Usage – IOPS** | ` sut_disk_read_iops ` / ` sut_disk_write_iops ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` resource.disk_read_iops ` / ` disk_write_iops ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
30- | 13 | ** E2E Latency** | ` e2e_latency_ms ` (Histogram\< f64\> ) | ** Instrument defined; not yet recorded** — requires timestamped events + query-back verification | ⚠️ Instrument only | 🔲 Not yet wired |
31- | 14 | ** E2E Duration** | ` test_duration_ms ` (Gauge\< u64\> ) | Wall-clock time of benchmark phase | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
32- | 15 | ** Query Queue Length** | ` query_queue_length ` (Gauge\< u64\> ) | Query worker queue depth at query execution start (attributes: ` query_name ` , ` client_id ` ) | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
33- | 16 | ** Query Queue Duration** | ` query_queue_duration_ms ` (Histogram\< f64\> ) | Query worker queue wait time before execution (attributes: ` query_name ` , ` client_id ` ) | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
16+ | # | Metric | OTel Instrument | Source | Emitted to telemetry | Status |
17+ | --- | ------------------------------------ | --------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | ------------------------ | ------------- |
18+ | 1 | ** Data Size** (total bytes ingested) | ` ingestion_bytes_total ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` ingestion.bytes_ingested ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
19+ | 2 | ** Ingestion records/s** | ` ingestion_rows_per_sec ` (Gauge\< f64\> ) | SUT adapter ` metrics ` → ` ingestion.rows_per_sec ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
20+ | 3 | ** Ingestion rows total** | ` ingestion_rows_total ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` ingestion.rows_ingested ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
21+ | 4 | ** Connections / Clients** | ` active_connections ` (Gauge\< u64\> ) | CLI ` --concurrency ` + SUT adapter ` metrics ` → ` ingestion.active_connections ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
22+ | 5 | ** Queries/s, Requests/s** | ` queries_per_sec ` (Gauge\< f64\> ), ` queries_total ` (Counter\< u64\> ) | Computed from total iterations / test duration | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
23+ | 6 | ** Query Latency (p50)** | ` median_duration_ms ` (Gauge\< u64\> ) | Query driver per-query statistics | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
24+ | 7 | ** Query Latency (p99)** | ` p99_duration_ms ` (Gauge\< u64\> ) | Query driver per-query statistics | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
25+ | 8 | ** Efficiency (cores)** | ` efficiency_queries_per_core ` (Gauge\< f64\> ) | Computed: ` queries_per_sec / cpu_cores ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
26+ | 9 | ** Resource Usage – CPU** | ` sut_cpu_usage_percent ` (Gauge\< f64\> ) | SUT adapter ` metrics ` → ` resource.cpu_usage_percent ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
27+ | 10 | ** Resource Usage – Memory** | ` peak_memory_usage_mb ` / ` median_memory_usage_mb ` (Gauge\< f64\> ), ` sut_memory_usage_bytes ` (Gauge\< u64\> ) | Local process via ` sysinfo ` + SUT adapter ` metrics ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
28+ | 11 | ** Resource Usage – Disk** | ` sut_disk_read_bytes ` / ` sut_disk_write_bytes ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` resource.disk_read_bytes ` / ` disk_write_bytes ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
29+ | 12 | ** Resource Usage – IOPS** | ` sut_disk_read_iops ` / ` sut_disk_write_iops ` (Gauge\< u64\> ) | SUT adapter ` metrics ` → ` resource.disk_read_iops ` / ` disk_write_iops ` | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
30+ | 13 | ** E2E Latency** | ` e2e_latency_ms ` (Histogram\< f64\> ) | Raw freshness scraper samples (` MAX(__created_at) ` deltas); percentiles are computed in dashboard queries | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
31+ | 14 | ** E2E Duration** | ` test_duration_ms ` (Gauge\< u64\> ) | Wall-clock time of benchmark phase | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
32+ | 15 | ** Query Queue Length** | ` query_queue_length ` (Gauge\< u64\> ) | Query worker queue depth at query execution start (attributes: ` query_name ` , ` client_id ` ) | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
33+ | 16 | ** Query Queue Duration** | ` query_queue_duration_ms ` (Histogram\< f64\> ) | Query worker queue wait time before execution (attributes: ` query_name ` , ` client_id ` ) | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
34+ | 17 | ** Checkpoint In-flight Queries** | ` checkpoint_in_flight_queries ` (Gauge\< u64\> ) | Active in-flight query count while checkpoint validation windows are enabled (` client_id ` ) | ✅ via ` Telemetry.emit() ` | ✅ Implemented |
3435
3536## Streaming Metrics (real-time, optional)
3637
@@ -89,7 +90,4 @@ The default `Handler::metrics()` implementation returns empty metrics, so existi
8990
9091## Remaining Work
9192
92- - [ ] ** E2E Latency** : Implement event-creation-to-queryable latency measurement. This requires:
93- 1 . Timestamping generated events at creation time
94- 2 . Querying the SUT for those events after ingestion
95- 3 . Recording the delta as ` e2e_latency_ms ` histogram observations
93+ - [ ] ** E2E Latency dashboard expansion** : Add optional additional percentile panels (e.g., p50/p90/p99.9) computed from ` e2e_latency_ms ` in Flux.
0 commit comments