Skip to content

Make PredictedLatencyScorer PD Aware#2145

Closed
RishabhSaini wants to merge 14 commits intokubernetes-sigs:mainfrom
RishabhSaini:modularSLOScorer
Closed

Make PredictedLatencyScorer PD Aware#2145
RishabhSaini wants to merge 14 commits intokubernetes-sigs:mainfrom
RishabhSaini:modularSLOScorer

Conversation

@RishabhSaini
Copy link
Contributor

@RishabhSaini RishabhSaini commented Jan 13, 2026

  1. Added configurable EndpointRoleLabel parameter to EPP config to enable role-aware predictions for any deployment (e.g., "llm-d.ai/role" for prefill/decode in llm-d).
 - type: predicted-latency-scorer
         parameters:
           endpointRoleLabel: "llm-d.ai/role"
           sloBufferFactor: 1.0
           headroomSelectionStrategy: "least"
  1. Created helper functions buildPredictionRequest() and buildTrainingEntry() that conditionally populate PodType from endpoint labels for role-specific model training (prefill vs decode).

  2. Changed runningRequestLists from map to sync.Map for thread safety. Updated all access patterns.

  3. Implemented prefill pod tracking across PreRequest (runningRequest tracking), ResponseReceived (training data collection), and ResponseComplete (runningRequest cleanup) hooks. Prefill TTFT is calculated as ResponseReceived - RequestReceived timestamp, capturing scheduling, queuing, prefill KV cache processing time, and network hop for KV cache transfer to decode pod.

The corresponding PR in llm-d-inference-scheduler is no longer needed:
Refer to llm-d/llm-d-inference-scheduler#564 (comment)

@netlify
Copy link

netlify bot commented Jan 13, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit aa4549a
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/698fad8e93791b0008d48906
😎 Deploy Preview https://deploy-preview-2145--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 13, 2026
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 13, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @RishabhSaini. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 13, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 13, 2026
@ahg-g
Copy link
Contributor

ahg-g commented Jan 14, 2026

/ok-to-test

I am concerned that we are still not cleaning up all the naming and terminology around the predicted-latency work (see #2032), some of it is not aligned with the terminology we have adopted so far (like using the term "router" or "slo"), and it is causing confusion in discussions with the community.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2026
@RishabhSaini RishabhSaini force-pushed the modularSLOScorer branch 3 times, most recently from a126548 to d074ce6 Compare January 14, 2026 16:28
@RishabhSaini RishabhSaini changed the title Make slo_aware_router modular Make PredictedLatencyScorer modular Jan 14, 2026
@RishabhSaini RishabhSaini force-pushed the modularSLOScorer branch 2 times, most recently from 1f239c3 to a3e3409 Compare January 15, 2026 15:49
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2026
@kaushikmitr
Copy link
Contributor

kaushikmitr commented Feb 2, 2026

/ok-to-test

I am concerned that we are still not cleaning up all the naming and terminology around the predicted-latency work (see #2032), some of it is not aligned with the terminology we have adopted so far (like using the term "router" or "slo"), and it is causing confusion in discussions with the community.

the naming of the router is resolved now

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 2, 2026
@RishabhSaini RishabhSaini force-pushed the modularSLOScorer branch 2 times, most recently from f5dcf0c to 7923353 Compare February 3, 2026 17:55
@RishabhSaini RishabhSaini force-pushed the modularSLOScorer branch 2 times, most recently from 8d1ea0c to 3be0cb9 Compare February 11, 2026 15:20
…orer

Get rid of the RequestBuilderStruct fashion and use helper funcs instead
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 11, 2026
@RishabhSaini RishabhSaini changed the title Make PredictedLatencyScorer modular Make PredictedLatencyScorer PD Aware Feb 11, 2026
@kaushikmitr
Copy link
Contributor

/retest

@kaushikmitr
Copy link
Contributor

@RishabhSaini lgtm, one question, we recently updated the latencypredictor plugin to move the prediction step to preparedata step, which happens before scheduling and introduced admission plugin. Just conforming your change still works with those changes. https://github.com/kubernetes-sigs/gateway-api-inference-extension/pulls?q=is%3Apr+is%3Aclosed+kaushikmitr

@RishabhSaini RishabhSaini deleted the modularSLOScorer branch February 17, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants