Skip to content

Add configurable prefill port support to prefill-header-handler plugin#329

Closed
pallavijaini0525 wants to merge 4 commits intollm-d:mainfrom
pallavijaini0525:prefillheader-port
Closed

Add configurable prefill port support to prefill-header-handler plugin#329
pallavijaini0525 wants to merge 4 commits intollm-d:mainfrom
pallavijaini0525:prefillheader-port

Conversation

@pallavijaini0525
Copy link

In prefill/decode (PD) disaggregation deployments, the prefill-header-handler plugin using the same targetPort for both the routing sidecar and prefill nodes. This resulted in incorrect x-prefiller-host-port headers being generated when prefill is running on a different port, preventing proper communication between decode and prefill workers.

Added an optional prefillTargetPort parameter to the prefill-header-handler plugin configuration. When specified, this parameter overrides the generic targetPort when constructing the x-prefiller-host-port header.

@shmuelk
Copy link
Collaborator

shmuelk commented Sep 25, 2025

/rebase

@elevran elevran moved this to In progress in llm-d-inference-scheduler Oct 19, 2025
@elevran elevran moved this from In progress to In review in llm-d-inference-scheduler Oct 20, 2025
@elevran
Copy link
Collaborator

elevran commented Oct 20, 2025

@pallavijaini0525 the PR is failing CI (linting) - kindly rebase and fix before we can move it forward.

@elevran elevran moved this from In review to In progress in llm-d-inference-scheduler Oct 20, 2025
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>
@pallavijaini0525
Copy link
Author

@elevran -> Lint issues are fixed. Thanks

@elevran
Copy link
Collaborator

elevran commented Oct 26, 2025

Thanks @pallavijaini0525!

I'm trying to further understand the motivation behind this requested change.

The target port (for both P and D) instances is inferred from the InferencePool configuration. P and D variants might have different hardware configurations (e.g., memory vs processing), but are otherwise running the same code. This would tyically imply that the Pods originate from different deployments (so P and D can be scaled differently).

Under what cicumstances and use cases would you expect P and D to use different serving ports?

@pallavijaini0525
Copy link
Author

@elevran - I extended this code for two reasons:

  1. For the prefill configuration, we can use a different port by updating ms-pd/values.yaml, but the epp/inference scheduler is currently fixed to port 8000, with no option to specify a different port in the epp config.

  2. About a month ago, we faced some networking issues, which required setting hostNetwork to true. This caused a conflict, preventing the use of port 8000 for both sidecar and prefill. Since then, we’ve fixed the networking issues with RDMA devices, and there's no longer a need for hostNetwork to be true.

I thought this fix would be useful for overriding the ports, so I left the PR open.

@shmuelk
Copy link
Collaborator

shmuelk commented Oct 27, 2025

After reviewing what you are trying to do here, I don't understand your comment:

For the prefill configuration, we can use a different port by updating ms-pd/values.yaml, but the epp/inference scheduler is currently fixed to port 8000, with no option to specify a different port in the epp config.

The epp is simply getting the port that was defined in the InferencePool. If your vLLM pods couldn't listen on port 8000 due to some networking issues, the InferencePool object should have been updated.

What field in ms-pd/values.yaml did you change? I looked at the helm charts, I think you used. There may also be an issue with the HttpRoute that is generated. As it has a port of 8000 hard coded in there.

@pallavijaini0525
Copy link
Author

@elevran @shmuelk -> I looked into the latest code https://github.com/llm-d/llm-d-inference-scheduler/blob/5176573e4df75b9868b1da8bc6421cf5198e7aac/pkg/plugins/pre-request/pd_prerequest.go#L80C2-L82C88 , we are using the port of the pod. Port defining in Inference pool is good enough for this.

i am closing this PR as it is not needed anymore.

@github-project-automation github-project-automation bot moved this from In progress to Done in llm-d-inference-scheduler Oct 27, 2025
@elevran
Copy link
Collaborator

elevran commented Oct 27, 2025

@pallavijaini0525 thanks for verifying, much appreciated! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants