Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters

## What

Add the `vllm:prefix_cache_hits` and `vllm:prefix_cache_queries` Prometheus counters to match vLLM's prefix cache observability surface.

Both are token-granularity counters — `queries` increments by total prompt tokens on each request, `hits` increments by the number of tokens found already cached. This matches vLLM v1's semantics in `kv_cache_manager.py` where `prefix_cache_stats.record(num_tokens=request.num_tokens, num_hits=num_new_computed_tokens)`.

## Why

The simulator currently tracks KV cache *utilization* (`vllm:kv_cache_usage_perc`) but has no metric for cache *effectiveness*. When benchmarking prefix-cache-aware scorer strategies (e.g., `precise-prefix-cache-scorer` vs `prefix-cache-scorer`), there's no way to measure whether routing decisions actually result in higher cache reuse without scraping these counters.

Both counters are needed — `hits` alone is uninterpretable without the `queries` denominator. Together they give `rate(vllm:prefix_cache_hits[5m]) / rate(vllm:prefix_cache_queries[5m])` for a rolling hit rate.

## Implementation notes

The data is already computed in `pkg/kv-cache/kv_cache.go:OnRequestStart()`:
- `len(tokens)` → maps to `queries` increment
- `nBlocksAlreadyInCache * blockSize` → maps to `hits` increment

Wiring follows the existing pattern: add a channel + async updater goroutine in `metrics.go`, same as `kvCacheUsageChan` → `kvCacheUsageUpdater()`.

Only applies when `--enable-kvcache` is set — counters stay at zero otherwise (matching vLLM behavior when prefix caching is disabled).

## Ref

- Discussed in #347 (comment by @mayabar)
- vLLM source: `vllm/v1/metrics/loggers.py:509-516`, `vllm/v1/metrics/stats.py:115-142`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters #356

What

Why

Implementation notes

Ref

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters #356

Description

What

Why

Implementation notes

Ref

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions