Add configurable prefill port support to prefill-header-handler plugin by pallavijaini0525 · Pull Request #329 · llm-d/llm-d-inference-scheduler

pallavijaini0525 · 2025-09-02T23:50:46Z

In prefill/decode (PD) disaggregation deployments, the prefill-header-handler plugin using the same targetPort for both the routing sidecar and prefill nodes. This resulted in incorrect x-prefiller-host-port headers being generated when prefill is running on a different port, preventing proper communication between decode and prefill workers.

Added an optional prefillTargetPort parameter to the prefill-header-handler plugin configuration. When specified, this parameter overrides the generic targetPort when constructing the x-prefiller-host-port header.

shmuelk · 2025-09-25T05:38:58Z

/rebase

elevran · 2025-10-20T13:54:53Z

@pallavijaini0525 the PR is failing CI (linting) - kindly rebase and fix before we can move it forward.

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>

pallavijaini0525 · 2025-10-20T17:07:22Z

@elevran -> Lint issues are fixed. Thanks

elevran · 2025-10-26T15:10:08Z

Thanks @pallavijaini0525!

I'm trying to further understand the motivation behind this requested change.

The target port (for both P and D) instances is inferred from the InferencePool configuration. P and D variants might have different hardware configurations (e.g., memory vs processing), but are otherwise running the same code. This would tyically imply that the Pods originate from different deployments (so P and D can be scaled differently).

Under what cicumstances and use cases would you expect P and D to use different serving ports?

pallavijaini0525 · 2025-10-26T17:58:53Z

@elevran - I extended this code for two reasons:

For the prefill configuration, we can use a different port by updating ms-pd/values.yaml, but the epp/inference scheduler is currently fixed to port 8000, with no option to specify a different port in the epp config.
About a month ago, we faced some networking issues, which required setting hostNetwork to true. This caused a conflict, preventing the use of port 8000 for both sidecar and prefill. Since then, we’ve fixed the networking issues with RDMA devices, and there's no longer a need for hostNetwork to be true.

I thought this fix would be useful for overriding the ports, so I left the PR open.

shmuelk · 2025-10-27T13:21:03Z

After reviewing what you are trying to do here, I don't understand your comment:

For the prefill configuration, we can use a different port by updating ms-pd/values.yaml, but the epp/inference scheduler is currently fixed to port 8000, with no option to specify a different port in the epp config.

The epp is simply getting the port that was defined in the InferencePool. If your vLLM pods couldn't listen on port 8000 due to some networking issues, the InferencePool object should have been updated.

What field in ms-pd/values.yaml did you change? I looked at the helm charts, I think you used. There may also be an issue with the HttpRoute that is generated. As it has a port of 8000 hard coded in there.

pallavijaini0525 · 2025-10-27T16:32:57Z

@elevran @shmuelk -> I looked into the latest code https://github.com/llm-d/llm-d-inference-scheduler/blob/5176573e4df75b9868b1da8bc6421cf5198e7aac/pkg/plugins/pre-request/pd_prerequest.go#L80C2-L82C88 , we are using the port of the pod. Port defining in Inference pool is good enough for this.

i am closing this PR as it is not needed anymore.

elevran · 2025-10-27T20:32:28Z

@pallavijaini0525 thanks for verifying, much appreciated! 🙏

pallavijaini0525 force-pushed the prefillheader-port branch from 547e356 to 0f3b836 Compare September 15, 2025 17:28

elevran added this to llm-d-inference-scheduler Oct 16, 2025

elevran moved this to In progress in llm-d-inference-scheduler Oct 19, 2025

elevran moved this from In progress to In review in llm-d-inference-scheduler Oct 20, 2025

elevran moved this from In review to In progress in llm-d-inference-scheduler Oct 20, 2025

pallavijaini0525 added 3 commits October 20, 2025 09:35

Added logic to handle the prefill port

e71dcff

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

Added logic to handle the prefill port - Fixed compilation issues

197299b

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

Edited the comments

d51298c

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

pallavijaini0525 force-pushed the prefillheader-port branch from 0f3b836 to d51298c Compare October 20, 2025 16:35

Fixed the lint issues

bbfa245

Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>

pallavijaini0525 closed this Oct 27, 2025

github-project-automation bot moved this from In progress to Done in llm-d-inference-scheduler Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable prefill port support to prefill-header-handler plugin#329

Add configurable prefill port support to prefill-header-handler plugin#329
pallavijaini0525 wants to merge 4 commits intollm-d:mainfrom
pallavijaini0525:prefillheader-port

pallavijaini0525 commented Sep 2, 2025

Uh oh!

shmuelk commented Sep 25, 2025

Uh oh!

elevran commented Oct 20, 2025

Uh oh!

pallavijaini0525 commented Oct 20, 2025

Uh oh!

elevran commented Oct 26, 2025

Uh oh!

pallavijaini0525 commented Oct 26, 2025

Uh oh!

shmuelk commented Oct 27, 2025 •

edited

Loading

Uh oh!

pallavijaini0525 commented Oct 27, 2025

Uh oh!

elevran commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pallavijaini0525 commented Sep 2, 2025

Uh oh!

shmuelk commented Sep 25, 2025

Uh oh!

elevran commented Oct 20, 2025

Uh oh!

pallavijaini0525 commented Oct 20, 2025

Uh oh!

elevran commented Oct 26, 2025

Uh oh!

pallavijaini0525 commented Oct 26, 2025

Uh oh!

shmuelk commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pallavijaini0525 commented Oct 27, 2025

Uh oh!

elevran commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shmuelk commented Oct 27, 2025 •

edited

Loading