Skip to content

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters#358

Open
InfraWhisperer wants to merge 2 commits intollm-d:mainfrom
InfraWhisperer:prefix-cache-metrics
Open

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters#358
InfraWhisperer wants to merge 2 commits intollm-d:mainfrom
InfraWhisperer:prefix-cache-metrics

Conversation

@InfraWhisperer
Copy link

@InfraWhisperer InfraWhisperer commented Feb 23, 2026

Summary

  • Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters matching vLLM v1 token-level semantics
  • queries increments by total prompt tokens per request, hits by cached tokens — enables rate(hits) / rate(queries) for cache effectiveness measurement
  • Follows existing channel + async updater goroutine pattern (kvCacheUsageChankvCacheUsageUpdater)

Closes #356

Test plan

  • Verify go build ./... passes
  • Run simulator with --enable-kvcache and confirm both counters appear on /metrics
  • Send repeated prompts with shared prefixes, verify prefix_cache_hits increments on subsequent requests
  • Confirm counters stay at zero when --enable-kvcache is not set
  • Validate rate(vllm:prefix_cache_hits[5m]) / rate(vllm:prefix_cache_queries[5m]) produces expected hit rate in Prometheus

@github-actions
Copy link

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@InfraWhisperer InfraWhisperer force-pushed the prefix-cache-metrics branch 2 times, most recently from 06409c6 to c2cb138 Compare February 23, 2026 15:46
Expose token-level prefix cache metrics matching vLLM v1 semantics.
Both counters increment per-request: queries by total prompt tokens,
hits by tokens found already cached. Enables computing cache hit rate
via rate(hits) / rate(queries) for scorer strategy benchmarking.

Closes llm-d#356

Signed-off-by: InfraWhisperer <raghav.potluri21@gmail.com>
Copy link
Collaborator

@mayabar mayabar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@InfraWhisperer thanks for you PR
Looks good, some general comments:

  • please add support for fake metrics: check new values validity on initialization, e.g. check that values are non-negative, initialize prometheus values from the fake values given in the configuration, ...
  • tests for the new metrics are missing: need to test calculation in both scenarios, real and fake metrics

Add PrefixCacheHits and PrefixCacheQueries fields to the fake metrics
config struct with validation (non-negative, must be specified together,
hits <= queries). Initialize Prometheus counters from fake values in
setInitialPrometheusMetrics. Add integration tests covering real prefix
cache metrics (sequential requests with shared prefixes), fake prefix
cache metrics (values appear on /metrics), and fake value immutability
(real requests don't mutate fake counters). Add config validation tests
for all error paths including the partial specification guard.

Signed-off-by: Raghav Potluri <raghav.potluri21@gmail.com>
@InfraWhisperer
Copy link
Author

@InfraWhisperer thanks for you PR Looks good, some general comments:

  • please add support for fake metrics: check new values validity on initialization, e.g. check that values are non-negative, initialize prometheus values from the fake values given in the configuration, ...
  • tests for the new metrics are missing: need to test calculation in both scenarios, real and fake metrics

Hi @mayabar I have added tests as per your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters

2 participants