feat: Add vLLM Data Parallel support to llm-d-inference-scheduler by shmuelk · Pull Request #392 · llm-d/llm-d-inference-scheduler

shmuelk · 2025-10-28T09:39:48Z

This PR adds support to the llm-d-inference-scheduler for vLLM's Data Parallel (DP) feature. This vLLM feature enables the running of multiple HTTP Servers in the same container.

In particular this PR attempts to work around the Istio issue 57638

This PR adds a new profile handler plugin data-parallel-profile-handler which needs to be used to enable DP when Disambiguated Prefill/Decode (PD) isn't used. This plugin changes the scheduling result to always schedule the request on the "base" port on the vLLM decode pod and adds a header for the sidecar.

Additionally the sidecar has been extended to work with the header added by the new data-parallel-profile-handler plugin. This is a work around to deal with Istio issue 57638 in which we can only route to teh "base" port of vLLM pods. It also always proxies all direct HTTP calls to the "extra" HTTP servers started by vLLM.

Ref #380

nirrozenbaum · 2025-10-28T09:45:06Z

deploy/components/crds-gie/kustomization.yaml


 resources:
- https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd?ref=v1.0.0
+- https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd?ref=v1.1.0-rc.1


v1.1.0 is out. let's use formal release:

Suggested change

- https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd?ref=v1.1.0-rc.1

- https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd?ref=v1.1.0

The go.mod file still refers to v1.1.0-rc.1. I'd prefer to keep things in sync.

The move to GIE 1.1.0 should be in a separate PR

nirrozenbaum · 2025-10-28T09:46:44Z

deploy/components/inference-gateway/rbac.yaml

  - "watch"
  - "list"
+- apiGroups:
+  - "inference.networking.k8s.io"


can we remove inferencepools from the inference.networking.x-k8s.io group?

I left it in in case someone wants to use the v1alpha2 version...

The kgateway deployment files still use the v1alpha2 InferencePool

nirrozenbaum · 2025-10-28T09:47:18Z

deploy/components/istio-control-plane/rbac.yaml

  verbs:
  - update
  - patch
+- apiGroups:


ditto - remove from inference.networking.x-k8s.io group.

The kgateway deployment files still use the v1alpha2 InferencePool

nirrozenbaum · 2025-10-28T09:49:21Z

deploy/components/vllm-sim/deployments.yaml

        - "--zmq-endpoint=tcp://${EPP_NAME}.default.svc.cluster.local:5557"
        - "--event-batch-size=16"
        - "--tokenizers-cache-dir=/tokenizer-cache"
+        - "--data-parallel-size=${VLLM_DATA_PARALLEL_SIZE}"


I'm a bit confused.
the flag --data-parallel-size is reading from env var VLLM_DATA_PARALLEL_SIZE which makes this configureable (which is great).

but the ports list below is hardcoded.

The ports just need to be exposed. There can't be more than eight ports. The flag simply says how many we want to use.

nirrozenbaum · 2025-10-28T09:49:58Z

deploy/components/vllm-sim-pd/deployments.yaml

        - "--vllm-port=8200"
        - "--connector=lmcache"
+        - "--secure-proxy=false"
+        - "--data-parallel-size=${VLLM_DATA_PARALLEL_SIZE}"


I thought dp is not used together with pd.
(this file is vllm-sim-pd).
what am I missing?

I have updated the non-pd deployment files to include the sidecar. The script has been changed to launch using the non-pd deployment files even if VLLM_DATA_PARALLEL_SIZE is greater than one.

A subsequent PR will add DP support for PD deployments. For now we'll leave this change here

The primary use for DP will be with PD deployments, I think supporting it is a feature blocker. Today's hot deployments are TP+DP+EP with PD 😆.

deploy/environments/dev/base-kind-istio/patch-deployments.yaml

pkg/plugins/profile/dp_profile_handler.go

nirrozenbaum · 2025-10-28T10:08:35Z

pkg/plugins/profile/dp_profile_handler.go

+	for _, target := range profileResult.TargetPods {
+		newPodInfo := target.GetPod().Clone()
+		newPodInfo.Port = h.targetPort
+		targetPod := &types.PodMetrics{Pod: newPodInfo, MetricsState: target.GetMetrics().Clone()}
+		newResult.TargetPods = append(newResult.TargetPods, targetPod)
+	}
+	modifiedResults := map[string]*types.ProfileRunResult{singleProfileName: &newResult}


not sure I got the idea of why we need newResult or modifiedResults.

can't we iterate over range profileResult.TargetPods and update only the port to PrimaryPort?

I don't want to depend on the fact that somewhere up the stack the PodInfo that the plugin received was a clone....

nirrozenbaum · 2025-10-28T10:11:39Z

deploy/config/dp-epp-config.yaml

+kind: EndpointPickerConfig
+plugins:
+- type: prefix-cache-scorer
+- type: decode-filter


this is a bit confusing cause dp is currently used without pd.
I understand why it's used, but maybe there is a less confusing alternative.

That filter is in all of the configuration files here. I actually don't know why....

ahg-g · 2025-10-28T14:58:48Z

For proxies that support multi-port, will we be able to use the canonical approach of connecting to the individual ports directly?

shmuelk · 2025-10-28T15:23:17Z

@ahg-g wrote:

For proxies that support multi-port, will we be able to use the canonical approach of connecting to the individual ports directly?

Yes. The code is this PR is basically a work around for Gateways like Istio, for now, that don't support routing to multiple ports in the same vLLM pod. If your Gateway, like KGateway, supports multi-port, then simply don't use the new data-parallel-profile-handler plugin.

There will be a follow up PR that will add PD support for Data Parallel in the sidecar and in the profile handler. I will make sure that any changes there are optional.

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

nirrozenbaum · 2025-11-03T10:00:49Z

/lgtm
/approve

github-project-automation bot added this to llm-d-inference-scheduler Oct 28, 2025

nirrozenbaum reviewed Oct 28, 2025

View reviewed changes

deploy/environments/dev/base-kind-istio/patch-deployments.yaml Outdated Show resolved Hide resolved

nirrozenbaum reviewed Oct 28, 2025

View reviewed changes

pkg/plugins/profile/dp_profile_handler.go Show resolved Hide resolved

nirrozenbaum reviewed Oct 28, 2025

View reviewed changes

pkg/plugins/profile/dp_profile_handler.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Oct 28, 2025

View reviewed changes

shmuelk force-pushed the data-parallel branch from 8eee855 to 7605b33 Compare October 28, 2025 13:42

shmuelk force-pushed the data-parallel branch from 2e66fd5 to d1c8b60 Compare October 30, 2025 13:27

shmuelk added 15 commits November 2, 2025 16:24

Added definition of header for Data Parallel support

9919f11

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added plugin for Data Parallel support without P/D

5d23c6b

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Refactored the sidecar and added Data Parallel support

0e58051

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Updates to sidecar tests due to refactoring

592726a

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

For local kind tests load locally built sidecar

34b24ed

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added Data Parallel support to kind based tests

cebf807

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Re-build all local images when testing under kind

63a50ff

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Review changes

747af30

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

lint fixes

fcb4a7d

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added the sidecar to the non-pd deployment

392cacc

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Removed debug logging

98aa681

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Don't use PD deployment, just because of DP

2ff145a

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

renamed plugin parameter as per review

98c9a76

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Removed command line argument with default value

aeb1698

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Removed test of removed deprecated code

6045882

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

shmuelk added 2 commits November 2, 2025 16:26

Removed definition of removed deprecated header

85733e2

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

image-build now builds the side-car image as well

e30494a

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

shmuelk force-pushed the data-parallel branch from d1c8b60 to e30494a Compare November 2, 2025 14:47

shmuelk added 2 commits November 2, 2025 17:03

Updates to tests

879fd39

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

image-build now builds the side-car image as well

9635b2a

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2025

github-actions bot approved these changes Nov 3, 2025

View reviewed changes

github-project-automation bot moved this to In review in llm-d-inference-scheduler Nov 3, 2025

github-actions bot merged commit 90dcc59 into llm-d:main Nov 3, 2025
6 checks passed

github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Nov 3, 2025

shmuelk deleted the data-parallel branch November 3, 2025 10:18

This was referenced Nov 10, 2025

[sidecar] Implement data parallel based routing using port hints #380

Closed

Support for Data Parallelism in serving #340

Closed

	- https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd?ref=v1.1.0-rc.1
	- https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd?ref=v1.1.0

Conversation

shmuelk commented Oct 28, 2025 • edited by elevran Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmuelk Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vMaroon Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g commented Oct 28, 2025

Uh oh!

shmuelk commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shmuelk commented Oct 28, 2025 •

edited by elevran

Loading

shmuelk Oct 28, 2025 •

edited

Loading

vMaroon Nov 3, 2025 •

edited

Loading

shmuelk commented Oct 28, 2025 •

edited

Loading