This is taking llm-d/llm-d-workload-variant-autoscaler#911 as a sample reference.
@vMaroon can you please comment? Thanks
Summary
The kv-cache library currently emits 9 Prometheus metrics covering the index Add/Evict/Lookup paths (kvcache_index_*) and tokenization latency (kvcache_tokenization_*). What is still missing: the kvevents/ZMQ subscriber layer, storage backend connectivity, and error conditions have no metrics at all. OTel tracing covers Lookup, ScoreTokens, and Scorer.Score, but Add, Evict, and the entire ZMQ event-processing path remain untraced. There are no alerting rules and no operational dashboard for the core index library.
This epic defines a phased improvement across metrics, tracing, alerting, and dashboards, broken into independently deliverable sub-issues.
Current State
| Area |
Status |
Notes |
| Prometheus metrics |
9 collectors |
Index: admissions_total, evictions_total, lookup_*; tokenization: 3 metrics (when pool enabled). Gated by enableMetrics. |
| Error metrics |
None |
Lookup / tokenization / backend errors not counted |
| kvevents layer |
None |
ZMQ subscriber and pool operate silently; no metrics, no traces |
| Backend metrics |
None |
Redis / Valkey connectivity unmonitored |
| OpenTelemetry tracing |
Partial |
Covers Lookup, ScoreTokens, Scorer.Score; Add, Evict, ZMQ path not traced |
| HTTP endpoints |
None in library |
Host application exposes /metrics; core library has no default HTTP listener, no /healthz or /readyz |
| Alerting rules |
None |
No PrometheusRule CRDs |
| Dashboards |
llmd_fs_backend connector only |
No operational dashboard for the core index library |
Components with No Observability Today
These files operate silently (no metrics, no traces):
pkg/kvevents/subscriber_manager.go — active subscriber count, reconnection events: zero metrics
pkg/kvevents/zmq_subscriber.go — message receive rate, ZMQ errors: zero metrics
pkg/kvevents/pool.go — pool utilization and capacity: zero metrics
pkg/kvcache/backend.go — backend health, connection errors: zero metrics
pkg/kvcache/kvblock/redis.go / in_memory.go / cost_aware_memory.go — per-backend cache size and eviction rates: zero metrics
pkg/kvcache/kvblock/traced_index.go — Add and Evict paths are not traced
Deployment Boundary
| Scenario |
Who owns it |
Embedded in EPP (precise-prefix-cache-scorer, enableMetrics: true) |
kvcache_* metrics appear on EPP's /metrics; no duplicate work needed here |
| Standalone service |
This epic: library-level metrics, health probes, scrape docs |
| EPP-side scrape configuration |
Stays in the llm-d-router observability issue |
When embedded in EPP, kvcache_* metrics share the same controller-runtime Registry and appear at EPP's :9090/metrics endpoint unchanged. This epic does not add a second scrape endpoint for the embedded case.
Sub-Issues
- Sub-issue 1: kvevents Subscriber Health Metrics: active subscriber gauge, message receive rate, ZMQ error counter, reconnection counter per
pod_identifier
- Sub-issue 2: Error Tracking Metrics: lookup error counter, tokenization error counter, backend read/write error counter by
backend_type
- Sub-issue 3: Cache Efficiency Metrics Enhancement:
lookup_hits_total / lookup_requests_total already exist; add an explicit hit_rate gauge, an index entry count gauge, and per-backend utilization labels to existing eviction/admission counters
- Sub-issue 4: Backend Connectivity Metrics: backend scrape latency histogram, error rate by
backend_type, connection pool size gauge; covers Redis, Valkey, in-memory, cost-aware-memory
- Sub-issue 5: Extend OTel Tracing Coverage:
Add and Evict in the traced index wrapper, ZMQ event-processing spans, stronger tokenization span propagation; follow the existing tracedIndex wrapper pattern
- Sub-issue 6: PrometheusRule Alerting Rules: high lookup error rate, ZMQ subscriber down, abnormal eviction spike, backend unreachable; targeting
kvcache_* core library metrics
- Sub-issue 7: Operational Grafana Dashboard for Core Library: index throughput, hit rate, tokenization latency, subscriber health, backend health; separate from the existing
llmd_fs_backend connector dashboard
- Sub-issue 8: Health / Readiness Endpoints for Standalone Deployments: HTTP
/healthz and /readyz for processes that embed the library directly; not needed for the EPP-embedded case
- Sub-issue 9: Observability Documentation: metric catalog, tracing setup guide, scrape config example for standalone and EPP-embedded deployments; EPP-side scrape config links to llm-d-router docs
Implementation Order
- Phase 1 (P0): Sub-issues 1, 2, 3 — Core metrics gaps (can be done in parallel)
- Phase 2 (P1): Sub-issues 4, 5, 8 — Backend metrics, tracing coverage, health probes
- Phase 3 (P2): Sub-issues 6, 7, 9 — Alerts, dashboard, docs
Implementation Notes
- New metrics must follow the existing
kvcache_<subsystem>_<name> naming convention (subsystems: index, tokenization) and register through pkg/kvcache/metrics/ via metrics.Register() / sync.Once using the controller-runtime metrics.Registry
- When embedded in EPP the same Registry is used; metric names are unchanged and no second registration is needed
- Histogram buckets for sub-millisecond index ops:
prometheus.DefBuckets; for ZMQ message processing latency: {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0}
- Tracing instrumentation must follow the
tracedIndex wrapper pattern; instrument at the interface boundary, do not inline spans into business logic
- Each sub-issue should be a separate PR for reviewability
- Sub-issues within the same phase can be developed in parallel
This is taking llm-d/llm-d-workload-variant-autoscaler#911 as a sample reference.
@vMaroon can you please comment? Thanks
Summary
The kv-cache library currently emits 9 Prometheus metrics covering the index Add/Evict/Lookup paths (
kvcache_index_*) and tokenization latency (kvcache_tokenization_*). What is still missing: the kvevents/ZMQ subscriber layer, storage backend connectivity, and error conditions have no metrics at all. OTel tracing coversLookup,ScoreTokens, andScorer.Score, butAdd,Evict, and the entire ZMQ event-processing path remain untraced. There are no alerting rules and no operational dashboard for the core index library.This epic defines a phased improvement across metrics, tracing, alerting, and dashboards, broken into independently deliverable sub-issues.
Current State
admissions_total,evictions_total,lookup_*; tokenization: 3 metrics (when pool enabled). Gated byenableMetrics.Lookup,ScoreTokens,Scorer.Score;Add,Evict, ZMQ path not traced/metrics; core library has no default HTTP listener, no/healthzor/readyzllmd_fs_backendconnector onlyComponents with No Observability Today
These files operate silently (no metrics, no traces):
pkg/kvevents/subscriber_manager.go— active subscriber count, reconnection events: zero metricspkg/kvevents/zmq_subscriber.go— message receive rate, ZMQ errors: zero metricspkg/kvevents/pool.go— pool utilization and capacity: zero metricspkg/kvcache/backend.go— backend health, connection errors: zero metricspkg/kvcache/kvblock/redis.go/in_memory.go/cost_aware_memory.go— per-backend cache size and eviction rates: zero metricspkg/kvcache/kvblock/traced_index.go—AddandEvictpaths are not tracedDeployment Boundary
precise-prefix-cache-scorer,enableMetrics: true)kvcache_*metrics appear on EPP's/metrics; no duplicate work needed hereWhen embedded in EPP,
kvcache_*metrics share the same controller-runtime Registry and appear at EPP's:9090/metricsendpoint unchanged. This epic does not add a second scrape endpoint for the embedded case.Sub-Issues
pod_identifierbackend_typelookup_hits_total/lookup_requests_totalalready exist; add an explicithit_rategauge, an index entry count gauge, and per-backend utilization labels to existing eviction/admission countersbackend_type, connection pool size gauge; covers Redis, Valkey, in-memory, cost-aware-memoryAddandEvictin the traced index wrapper, ZMQ event-processing spans, stronger tokenization span propagation; follow the existingtracedIndexwrapper patternkvcache_*core library metricsllmd_fs_backendconnector dashboard/healthzand/readyzfor processes that embed the library directly; not needed for the EPP-embedded caseImplementation Order
Implementation Notes
kvcache_<subsystem>_<name>naming convention (subsystems:index,tokenization) and register throughpkg/kvcache/metrics/viametrics.Register()/sync.Onceusing the controller-runtimemetrics.Registryprometheus.DefBuckets; for ZMQ message processing latency:{0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0}tracedIndexwrapper pattern; instrument at the interface boundary, do not inline spans into business logic