Skip to content

Conversation

@zetxqx
Copy link
Contributor

@zetxqx zetxqx commented Nov 4, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:
This PR introduces a new histogram metric, inference_objective_prompt_cached_tokens, to provide visibility into the
effectiveness of the vLLM prefix caching feature.

The key changes include:

  • New Metric: A new Prometheus histogram inference_objective_prompt_cached_tokens is registered to track the distribution of cached token counts.
  • Response Handling: The response handlers in pkg/epp/handlers/ are updated to safely parse the new field from both
    non-streaming and streaming responses. If the field is not present in a response, no metric is recorded for that request,
    ensuring data accuracy

Which issue(s) this PR fixes:

Fixes partially #1304

Does this PR introduce a user-facing change?:

Prompt Cached Tokens Metric: Introduced a new metric (inference_objective_prompt_cached_tokens) to track the number of cached prompt tokens utilized by vLLM. This provides better visibility into the efficiency of prefix caching and helps in optimizing model serving.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 4, 2025
@netlify
Copy link

netlify bot commented Nov 4, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit b35d816
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/690c002f0e185000070e732b
😎 Deploy Preview https://deploy-preview-1814--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 4, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 4, 2025
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 4, 2025

/assign @liu-cong

@kfswain
Copy link
Collaborator

kfswain commented Nov 5, 2025

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2025

// RecordPromptCachedTokens records prompt cached tokens count.
func RecordPromptCachedTokens(modelName, targetModelName string, size int) {
if size > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why checking this? Isn't 0 a valid value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added emitting zero, please take another look.

Copy link
Contributor

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, liu-cong, zetxqx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 7488c2b into kubernetes-sigs:main Nov 6, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants