Add configurable prefill port support to prefill-header-handler plugin#329
Add configurable prefill port support to prefill-header-handler plugin#329pallavijaini0525 wants to merge 4 commits intollm-d:mainfrom
Conversation
547e356 to
0f3b836
Compare
|
/rebase |
|
@pallavijaini0525 the PR is failing CI (linting) - kindly rebase and fix before we can move it forward. |
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
0f3b836 to
d51298c
Compare
Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>
|
@elevran -> Lint issues are fixed. Thanks |
|
Thanks @pallavijaini0525! I'm trying to further understand the motivation behind this requested change. The target port (for both P and D) instances is inferred from the InferencePool configuration. P and D variants might have different hardware configurations (e.g., memory vs processing), but are otherwise running the same code. This would tyically imply that the Pods originate from different deployments (so P and D can be scaled differently). Under what cicumstances and use cases would you expect P and D to use different serving ports? |
|
@elevran - I extended this code for two reasons:
I thought this fix would be useful for overriding the ports, so I left the PR open. |
|
After reviewing what you are trying to do here, I don't understand your comment:
The epp is simply getting the port that was defined in the InferencePool. If your vLLM pods couldn't listen on port 8000 due to some networking issues, the InferencePool object should have been updated. What field in ms-pd/values.yaml did you change? I looked at the helm charts, I think you used. There may also be an issue with the HttpRoute that is generated. As it has a port of 8000 hard coded in there. |
|
@elevran @shmuelk -> I looked into the latest code https://github.com/llm-d/llm-d-inference-scheduler/blob/5176573e4df75b9868b1da8bc6421cf5198e7aac/pkg/plugins/pre-request/pd_prerequest.go#L80C2-L82C88 , we are using the port of the pod. Port defining in Inference pool is good enough for this. i am closing this PR as it is not needed anymore. |
|
@pallavijaini0525 thanks for verifying, much appreciated! 🙏 |
In prefill/decode (PD) disaggregation deployments, the prefill-header-handler plugin using the same targetPort for both the routing sidecar and prefill nodes. This resulted in incorrect x-prefiller-host-port headers being generated when prefill is running on a different port, preventing proper communication between decode and prefill workers.
Added an optional prefillTargetPort parameter to the prefill-header-handler plugin configuration. When specified, this parameter overrides the generic targetPort when constructing the x-prefiller-host-port header.