Make PredictedLatencyScorer PD Aware by RishabhSaini · Pull Request #2145 · kubernetes-sigs/gateway-api-inference-extension

RishabhSaini · 2026-01-13T23:26:12Z

Added configurable EndpointRoleLabel parameter to EPP config to enable role-aware predictions for any deployment (e.g., "llm-d.ai/role" for prefill/decode in llm-d).

 - type: predicted-latency-scorer
         parameters:
           endpointRoleLabel: "llm-d.ai/role"
           sloBufferFactor: 1.0
           headroomSelectionStrategy: "least"

Created helper functions buildPredictionRequest() and buildTrainingEntry() that conditionally populate PodType from endpoint labels for role-specific model training (prefill vs decode).
Changed runningRequestLists from map to sync.Map for thread safety. Updated all access patterns.
Implemented prefill pod tracking across PreRequest (runningRequest tracking), ResponseReceived (training data collection), and ResponseComplete (runningRequest cleanup) hooks. Prefill TTFT is calculated as ResponseReceived - RequestReceived timestamp, capturing scheduling, queuing, prefill KV cache processing time, and network hop for KV cache transfer to decode pod.

The corresponding PR in llm-d-inference-scheduler is no longer needed:
Refer to llm-d/llm-d-inference-scheduler#564 (comment)

netlify · 2026-01-13T23:26:18Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`aa4549a`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/698fad8e93791b0008d48906
😎 Deploy Preview	https://deploy-preview-2145--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-01-13T23:26:22Z

Hi @RishabhSaini. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ahg-g · 2026-01-14T04:13:11Z

/ok-to-test

I am concerned that we are still not cleaning up all the naming and terminology around the predicted-latency work (see #2032), some of it is not aligned with the terminology we have adopted so far (like using the term "router" or "slo"), and it is causing confusion in discussions with the community.

kaushikmitr · 2026-02-02T18:39:42Z

/ok-to-test

I am concerned that we are still not cleaning up all the naming and terminology around the predicted-latency work (see #2032), some of it is not aligned with the terminology we have adopted so far (like using the term "router" or "slo"), and it is causing confusion in discussions with the community.

the naming of the router is resolved now

Protect all map accesses in PreRequest, ResponseComplete and scoring Methods Add public API for adding, removing from the runnintRequestList (also requires getAvgTPOTSLO and GetSchedulingResult)

…nRequest

…an config

…e in inference-sched

…e smell

…hooks to llm-d-inference-scheduler and handle it using exdperimental prefill profile type

…orer Get rid of the RequestBuilderStruct fashion and use helper funcs instead

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/scorer.go

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/selection.go

kaushikmitr · 2026-02-16T22:33:05Z

/retest

kaushikmitr · 2026-02-16T22:38:17Z

@RishabhSaini lgtm, one question, we recently updated the latencypredictor plugin to move the prediction step to preparedata step, which happens before scheduling and introduced admission plugin. Just conforming your change still works with those changes. https://github.com/kubernetes-sigs/gateway-api-inference-extension/pulls?q=is%3Apr+is%3Aclosed+kaushikmitr

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/requestcontrol_hooks.go

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 13, 2026

k8s-ci-robot requested review from kfswain and liu-cong January 13, 2026 23:26

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 13, 2026

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 13, 2026

RishabhSaini force-pushed the modularSLOScorer branch from 707d0ca to 2754676 Compare January 13, 2026 23:41

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 13, 2026

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2026

RishabhSaini force-pushed the modularSLOScorer branch 3 times, most recently from a126548 to d074ce6 Compare January 14, 2026 16:28

This was referenced Jan 14, 2026

LatencyPredictionScorer for PD llm-d/llm-d-inference-scheduler#564

Closed

PredictedLatency support for PD llm-d/llm-d#596

Open

RishabhSaini changed the title ~~Make slo_aware_router modular~~ Make PredictedLatencyScorer modular Jan 14, 2026

RishabhSaini force-pushed the modularSLOScorer branch 2 times, most recently from 1f239c3 to a3e3409 Compare January 15, 2026 15:49

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2026

RishabhSaini force-pushed the modularSLOScorer branch from a3e3409 to 95241ab Compare February 2, 2026 23:12

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2026

RishabhSaini force-pushed the modularSLOScorer branch from 95241ab to 4d15e85 Compare February 2, 2026 23:56

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 2, 2026

RishabhSaini force-pushed the modularSLOScorer branch 2 times, most recently from f5dcf0c to 7923353 Compare February 3, 2026 17:55

RishabhSaini added 10 commits February 11, 2026 09:04

Add runningRequestListMux to protect map from race condition

4069721

Protect all map accesses in PreRequest, ResponseComplete and scoring Methods Add public API for adding, removing from the runnintRequestList (also requires getAvgTPOTSLO and GetSchedulingResult)

use requestBuilder for all invocations of TrainingEntry and Predictio…

cb4647a

…nRequest

switch the runningRequesQueue from a map with RWMutex to sync.map

4317c91

use endpointMetadata instead of Endpoint for RequestBuilder

6261167

RequestBuilder should be a field of PredictedLatency struct rather th…

83c8786

…an config

deduplicate GetSchedulingResult and make export StartPredictor for us…

34a1166

…e in inference-sched

RecordTrainingForProfile higher level api to fix the fixture envy cod…

7d5abe2

…e smell

camel case in processPreRequestForLatencyPrediction

1b16caf

get rid of exporting PreRequest, ResponseComplete, ResponseStreaming …

aac6138

…hooks to llm-d-inference-scheduler and handle it using exdperimental prefill profile type

prefill ttft = ResponseReceived - PreRequest

3be0cb9

RishabhSaini force-pushed the modularSLOScorer branch 2 times, most recently from 8d1ea0c to 3be0cb9 Compare February 11, 2026 15:20

Add the EndpointRoleLabel as a parameter for the predicted-latency-sc…

0d12e8a

…orer Get rid of the RequestBuilderStruct fashion and use helper funcs instead

RishabhSaini force-pushed the modularSLOScorer branch from 423ee58 to 0d12e8a Compare February 11, 2026 16:07

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 11, 2026

RishabhSaini changed the title ~~Make PredictedLatencyScorer modular~~ Make PredictedLatencyScorer PD Aware Feb 11, 2026

kaushikmitr reviewed Feb 13, 2026

View reviewed changes

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/scorer.go Show resolved Hide resolved

kaushikmitr reviewed Feb 13, 2026

View reviewed changes

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/selection.go Show resolved Hide resolved

add getRunningRequestList

aa4549a

RishabhSaini requested a review from kaushikmitr February 13, 2026 23:06

kaushikmitr reviewed Feb 16, 2026

View reviewed changes

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/requestcontrol_hooks.go Show resolved Hide resolved

kaushikmitr reviewed Feb 16, 2026

View reviewed changes

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/requestcontrol_hooks.go Show resolved Hide resolved

kaushikmitr reviewed Feb 16, 2026

View reviewed changes

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/requestcontrol_hooks.go Show resolved Hide resolved

kaushikmitr reviewed Feb 16, 2026

View reviewed changes

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/requestcontrol_hooks.go Show resolved Hide resolved

RishabhSaini closed this Feb 17, 2026

RishabhSaini deleted the modularSLOScorer branch February 17, 2026 20:18

RishabhSaini mentioned this pull request Feb 17, 2026

Make PredictedLatencyScorer PD Aware #2361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make PredictedLatencyScorer PD Aware#2145

Make PredictedLatencyScorer PD Aware#2145
RishabhSaini wants to merge 14 commits intokubernetes-sigs:mainfrom
RishabhSaini:modularSLOScorer

RishabhSaini commented Jan 13, 2026 •

edited

Loading

Uh oh!

netlify bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 13, 2026

Uh oh!

ahg-g commented Jan 14, 2026

Uh oh!

kaushikmitr commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

kaushikmitr commented Feb 16, 2026

Uh oh!

kaushikmitr commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RishabhSaini commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jan 13, 2026

Uh oh!

ahg-g commented Jan 14, 2026

Uh oh!

kaushikmitr commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaushikmitr commented Feb 16, 2026

Uh oh!

kaushikmitr commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RishabhSaini commented Jan 13, 2026 •

edited

Loading

netlify bot commented Jan 13, 2026 •

edited

Loading

kaushikmitr commented Feb 2, 2026 •

edited

Loading