Make PredictedLatencyScorer PD Aware#2145
Make PredictedLatencyScorer PD Aware#2145RishabhSaini wants to merge 14 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @RishabhSaini. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
707d0ca to
2754676
Compare
|
/ok-to-test I am concerned that we are still not cleaning up all the naming and terminology around the predicted-latency work (see #2032), some of it is not aligned with the terminology we have adopted so far (like using the term "router" or "slo"), and it is causing confusion in discussions with the community. |
a126548 to
d074ce6
Compare
1f239c3 to
a3e3409
Compare
the naming of the router is resolved now |
a3e3409 to
95241ab
Compare
95241ab to
4d15e85
Compare
f5dcf0c to
7923353
Compare
Protect all map accesses in PreRequest, ResponseComplete and scoring Methods Add public API for adding, removing from the runnintRequestList (also requires getAvgTPOTSLO and GetSchedulingResult)
…e in inference-sched
…hooks to llm-d-inference-scheduler and handle it using exdperimental prefill profile type
8d1ea0c to
3be0cb9
Compare
…orer Get rid of the RequestBuilderStruct fashion and use helper funcs instead
423ee58 to
0d12e8a
Compare
pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/selection.go
Show resolved
Hide resolved
|
/retest |
|
@RishabhSaini lgtm, one question, we recently updated the latencypredictor plugin to move the prediction step to preparedata step, which happens before scheduling and introduced admission plugin. Just conforming your change still works with those changes. https://github.com/kubernetes-sigs/gateway-api-inference-extension/pulls?q=is%3Apr+is%3Aclosed+kaushikmitr |
EndpointRoleLabelparameter to EPP config to enable role-aware predictions for any deployment (e.g.,"llm-d.ai/role"for prefill/decode in llm-d).Created helper functions
buildPredictionRequest()andbuildTrainingEntry()that conditionally populatePodTypefrom endpoint labels for role-specific model training (prefill vs decode).Changed
runningRequestListsfrom map tosync.Mapfor thread safety. Updated all access patterns.Implemented prefill pod tracking across PreRequest (runningRequest tracking), ResponseReceived (training data collection), and ResponseComplete (runningRequest cleanup) hooks. Prefill TTFT is calculated as
ResponseReceived - RequestReceivedtimestamp, capturing scheduling, queuing, prefill KV cache processing time, and network hop for KV cache transfer to decode pod.The corresponding PR in
llm-d-inference-scheduleris no longer needed:Refer to llm-d/llm-d-inference-scheduler#564 (comment)