Skip to content

Commit b750ef4

Browse files
authored
Merge pull request #261 from zdtsw-forking/sync/upstream-f7e88524
[sync] upstream llm-d/llm-d-router f7e8852 [2026-06-22]
2 parents bb5f430 + c085bd7 commit b750ef4

23 files changed

Lines changed: 547 additions & 217 deletions

File tree

.github/workflows/ci-pr-checks.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -450,4 +450,4 @@ jobs:
450450
E2E_LABEL_FILTER: ${{ matrix.suite.label-filter }}
451451
LOAD_VLLM_RENDER_IMAGE: ${{ matrix.suite.needs-renderer }}
452452
PULL_VLLM_RENDER_IMAGE: "false"
453-
run: make test-e2e-scheduler-run
453+
run: make test-e2e-router-run

Makefile

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -304,24 +304,24 @@ test-e2e-gaie-run: image-pull ## Ensure images are present, then run GAIE e2e te
304304
$(CONTAINER_RUNTIME) run $(BUILDER_RUN_FLAGS) $(BUILDER_E2E_FLAGS) \
305305
-e EPP_IMAGE=$(GAIE_E2E_IMAGE) \
306306
-e USE_KIND=true \
307-
$(BUILDER_IMAGE) ./hack/test-e2e.sh
307+
$(BUILDER_IMAGE) ./test/scripts/test-e2e-gaie.sh
308308

309309
.PHONY: test-e2e-gaie
310310
test-e2e-gaie: image-build-builder image-build ## Build images and run GAIE e2e tests
311311
$(MAKE) test-e2e-gaie-run
312312

313-
.PHONY: test-e2e-scheduler-run
314-
test-e2e-scheduler-run: image-pull ## Ensure images are present, then run scheduler e2e tests
313+
.PHONY: test-e2e-router-run
314+
test-e2e-router-run: image-pull ## Ensure images are present, then run router e2e tests
315315
@printf "\033[33;1m==== Running End to End Tests ====\033[0m\n"
316316
$(CONTAINER_RUNTIME) run $(BUILDER_RUN_FLAGS) $(BUILDER_E2E_FLAGS) \
317-
$(BUILDER_IMAGE) ./test/scripts/run_e2e.sh
317+
$(BUILDER_IMAGE) ./test/scripts/test-e2e-router.sh
318318

319-
.PHONY: test-e2e-scheduler
320-
test-e2e-scheduler: image-build-builder image-build ## Build images and run scheduler e2e tests
321-
$(MAKE) test-e2e-scheduler-run
319+
.PHONY: test-e2e-router
320+
test-e2e-router: image-build-builder image-build ## Build images and run router e2e tests
321+
$(MAKE) test-e2e-router-run
322322

323323
.PHONY: test-e2e
324-
test-e2e: test-e2e-gaie test-e2e-scheduler ## Run all end-to-end tests sequentially
324+
test-e2e: test-e2e-gaie test-e2e-router ## Run all end-to-end tests sequentially
325325

326326

327327
.PHONY: bench-tokenizer

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ A lightweight deployment where a self-managed Envoy proxy runs alongside the EPP
3939
### 2. Gateway Mode (Inference Gateway)
4040
The recommended mode for production environments, leveraging the official [Gateway API]. In this mode, the EPP acts as a backend for an `InferencePool`, which is referenced by an `HTTPRoute` on a shared `Gateway`. This enables advanced traffic management, multi-cluster load balancing, and shared infrastructure for both inference and traditional workloads.
4141

42-
For more details on the router architecture, routing logic, and different plugins (filters and scorers), see the [Architecture Documentation].
42+
For more details on the router architecture, routing logic, and different plugins (filters and scorers), see the [Architecture Documentation]. For resource provisioning and container sizing recommendations under heavy or long-context workloads, see the [EPP Container Sizing Guide].
4343

4444
---
4545

@@ -61,6 +61,7 @@ To ensure clarity across the project, we use the following standard terminology:
6161
[Kubernetes Gateway API]:https://gateway-api.sigs.k8s.io/
6262
[Architecture Documentation]:docs/architecture.md
6363
[Disaggregation Documentation]:docs/disaggregation.md
64+
[EPP Container Sizing Guide]:docs/operations.md
6465
[InferencePool]:https://github.com/kubernetes-sigs/gateway-api-inference-extension
6566
[Gateway API Inference Extension (GIE)]:https://github.com/kubernetes-sigs/gateway-api-inference-extension
6667
[Kubernetes Gateway API Inference Extensions]:https://github.com/kubernetes-sigs/gateway-api-inference-extension

config/charts/README.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,14 @@ helm install my-standalone-router ./config/charts/llm-d-router-standalone \
3232
--set router.modelServers.matchLabels.app=my-vllm-service
3333
```
3434

35-
#### Standalone with Agentgateway Proxy (Service-Backed)
36-
Deploys EPP with an Agentgateway proxy. This mode requires disabling the `InferencePool` resource creation (`create=false`) and routes traffic to an existing Kubernetes Service:
35+
#### Standalone with Agentgateway Proxy
36+
Deploys EPP with an Agentgateway proxy. This mode requires disabling the `InferencePool` resource creation (`create=false`) and routes traffic directly to model servers:
3737

3838
```bash
3939
helm install my-standalone-router ./config/charts/llm-d-router-standalone \
4040
--set router.inferencePool.create=false \
4141
--set router.proxy.proxyType=agentgateway \
42-
--set router.proxy.agentgateway.service.name=my-model-service \
43-
--set router.proxy.agentgateway.service.ports="8000"
42+
--set router.modelServers.matchLabels.app=my-model-service
4443
```
4544

4645
#### Standalone with a Separate Proxy Service
@@ -538,20 +537,21 @@ Configures EPP to run with a proxy (Envoy proxy or Agentgateway proxy) that inte
538537
| `router.proxy.volumeMounts` | Sidecar container volume mounts. | `[]` |
539538
| `router.proxy.volumes` | Sidecar container volumes. | `[]` |
540539
| `router.proxy.configMapData` | Key-value pairs to include in a ConfigMap created for the sidecar. | `{}` |
541-
| `router.proxy.agentgateway.service.create` | **Agentgateway only**. Create a dedicated model Service for the Agentgateway proxy. | `true` |
542-
| `router.proxy.agentgateway.service.name` | **Agentgateway only**. Name of the model Service to route to. | `""` |
543-
| `router.proxy.agentgateway.service.namespace` | **Agentgateway only**. Namespace of the model Service. Defaults to release namespace. | `""` |
544-
| `router.proxy.agentgateway.service.ports` | **Agentgateway only**. Port list for the model Service (must match `modelServers.targetPorts`). | `[]` |
540+
#### Complete Standalone Example with Agentgateway Proxy
545541

546-
#### Complete Proxy Sidecar Example (Agentgateway Service-Backed)
547-
548-
To deploy EPP in standalone mode with an Agentgateway sidecar routing traffic directly to an existing model Service `my-model-service` (bypassing `InferencePool` creation):
542+
To deploy EPP in standalone mode with an Agentgateway sidecar routing traffic directly to model servers matching the label `app=my-model-service` (bypassing `InferencePool` creation):
549543

550544
```yaml
551545
router:
552546
inferencePool:
553547
create: false # Disable InferencePool creation
554548
549+
modelServers:
550+
matchLabels:
551+
app: "my-model-service"
552+
targetPorts:
553+
- number: 8000
554+
555555
proxy:
556556
enabled: true
557557
proxyType: agentgateway
@@ -561,10 +561,4 @@ router:
561561
memory: 4Gi
562562
limits:
563563
memory: 8Gi
564-
agentgateway:
565-
service:
566-
create: true # Create a Service to route client traffic to EPP
567-
name: "my-model-service"
568-
ports:
569-
- 8000 # Intercept traffic on port 8000
570564
```

config/charts/llm-d-router-standalone/templates/_validations.tpl

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,13 @@ standalone validations
1818
{{- if not (or (eq $proxyMode "sidecar") (eq $proxyMode "service")) -}}
1919
{{- fail (printf ".Values.router.proxy.mode must be one of [sidecar, service], got %q" $proxyMode) -}}
2020
{{- end -}}
21+
{{- /* Without an InferencePool the EPP --endpoint-selector is rendered from modelServers.matchLabels; an empty selector is rejected by EPP at startup, so require it here. */ -}}
22+
{{- $useInferencePool := ne .Values.router.inferencePool.create false -}}
23+
{{- if not $useInferencePool -}}
24+
{{- if or (empty .Values.router.modelServers) (not .Values.router.modelServers.matchLabels) -}}
25+
{{- fail ".Values.router.modelServers.matchLabels is required when .Values.router.inferencePool.create=false: standalone mode renders the EPP --endpoint-selector from matchLabels and cannot start with an empty selector" -}}
26+
{{- end -}}
27+
{{- end -}}
2128
{{- $failOpen := index $proxy "failOpen" -}}
2229
{{- if and (not (kindIs "invalid" $failOpen)) (not (kindIs "bool" $failOpen)) -}}
2330
{{- fail (printf ".Values.router.proxy.failOpen must be a boolean, got %q" (toString $failOpen)) -}}
@@ -46,32 +53,17 @@ standalone validations
4653
{{- fail (printf ".Values.router.proxy.proxyType must be one of [envoy, agentgateway], got %q" $proxyType) -}}
4754
{{- end -}}
4855
{{- if eq $proxyType "agentgateway" -}}
56+
{{- if hasKey $proxy "agentgateway" -}}
57+
{{- fail ".Values.router.proxy.agentgateway is no longer supported; standalone agentgateway uses EPP endpoint discovery with a logical service backend" -}}
58+
{{- end -}}
4959
{{- if ne .Values.router.inferencePool.create false -}}
5060
{{- fail ".Values.router.inferencePool.create=false is required when proxyType=agentgateway; standalone agentgateway currently supports only service-backed routing" -}}
5161
{{- end -}}
52-
{{- $agentgateway := index $proxy "agentgateway" | default dict -}}
53-
{{- $service := index $agentgateway "service" | default dict -}}
54-
{{- $serviceName := index $service "name" | default "" -}}
55-
{{- $serviceCreate := index $service "create" | default true -}}
56-
{{- if hasKey $service "port" -}}
57-
{{- fail ".Values.router.proxy.agentgateway.service.port has been replaced by .Values.router.proxy.agentgateway.service.ports" -}}
58-
{{- end -}}
59-
{{- if empty $serviceName -}}
60-
{{- fail ".Values.router.proxy.agentgateway.service.name is required when proxyType=agentgateway" -}}
61-
{{- end -}}
62-
{{- $targetPorts := include "llm-d-router.standaloneEndpointTargetPorts" . -}}
63-
{{- $servicePorts := include "llm-d-router.agentgateway.modelServicePorts" . -}}
64-
{{- if ne $targetPorts $servicePorts -}}
65-
{{- fail (printf ".Values.router.proxy.agentgateway.service.ports must match .Values.router.modelServers.targetPorts when proxyType=agentgateway, got service ports %q and target ports %q" $servicePorts $targetPorts) -}}
66-
{{- end -}}
6762
{{- $listenerPort := include "llm-d-router.standaloneProxyListenerPort" . -}}
6863
{{- $flags := .Values.router.epp.flags | default dict -}}
6964
{{- if and (hasKey $flags "secure-serving") (ne (toString (index $flags "secure-serving")) "false") -}}
7065
{{- fail ".Values.router.epp.flags.secure-serving must be false when proxyType=agentgateway; standalone agentgateway uses plaintext gRPC to EPP over localhost" -}}
7166
{{- end -}}
72-
{{- if $serviceCreate -}}
73-
{{- $selectorLabels := include "llm-d-router.agentgateway.modelServiceSelectorLabels" . -}}
74-
{{- end -}}
7567
{{- end -}}
7668
{{- end -}}
7769
{{- end -}}

config/charts/llm-d-router-standalone/templates/agentgateway-service.yaml

Lines changed: 0 additions & 32 deletions
This file was deleted.

config/charts/llm-d-router-standalone/values.yaml

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,7 @@ router:
3333
limits:
3434
memory: 16Gi
3535

36-
# Agentgateway-specific settings used by the built-in preset when
37-
# proxyType=agentgateway. service.name is required.
38-
agentgateway:
39-
service:
40-
create: true
41-
name: ""
42-
namespace: ""
43-
# Must match inferencePool.targetPorts.
44-
ports:
45-
- 8000
36+
4637

4738
# Built-in standalone proxy presets. The selected preset is merged with the
4839
# top-level proxy.* fields below, so explicit user overrides still win.

config/charts/routerlib/templates/_helpers.tpl

Lines changed: 17 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -269,13 +269,20 @@ Return the standalone EPP model-server target ports.
269269
{{- end -}}
270270

271271
{{/*
272-
Return the agentgateway model Service ports.
272+
Return the agentgateway standalone logical backend service name.
273+
Derives the name from .Values.router.modelServers.matchLabels.app,
274+
falling back to .Release.Name if not set.
273275
*/}}
274-
{{- define "llm-d-router.agentgateway.modelServicePorts" -}}
275-
{{- $proxyValues := .Values.router.proxy | default dict -}}
276-
{{- $agentgateway := index $proxyValues "agentgateway" | default dict -}}
277-
{{- $service := index $agentgateway "service" | default dict -}}
278-
{{- include "llm-d-router.normalizedPortList" (dict "path" ".Values.router.proxy.agentgateway.service.ports" "value" (index $service "ports")) -}}
276+
{{- define "llm-d-router.agentgateway.logicalBackendName" -}}
277+
{{- $appLabel := "" -}}
278+
{{- if and .Values.router.modelServers .Values.router.modelServers.matchLabels -}}
279+
{{- $appLabel = index .Values.router.modelServers.matchLabels "app" | default "" -}}
280+
{{- end -}}
281+
{{- if not (empty $appLabel) -}}
282+
{{- $appLabel -}}
283+
{{- else -}}
284+
{{- .Release.Name -}}
285+
{{- end -}}
279286
{{- end -}}
280287

281288
{{/*
@@ -329,30 +336,15 @@ Return the rendered proxy ConfigMap data.
329336
{{- toYaml $data -}}
330337
{{- end -}}
331338

332-
{{/*
333-
Render labels from the standalone endpoint selector for the generated model Service.
334-
Only equality-based selectors are supported because Service selectors are a map.
335-
*/}}
336-
{{- define "llm-d-router.agentgateway.modelServiceSelectorLabels" -}}
337-
{{- if and .Values.router.modelServers .Values.router.modelServers.matchLabels -}}
338-
{{- range $key, $value := .Values.router.modelServers.matchLabels -}}
339-
{{- printf "%s: %s\n" ($key | quote) ($value | quote) -}}
340-
{{- end -}}
341-
{{- else -}}
342-
{{- fail ".Values.modelServers.matchLabels is required when creating an agentgateway model Service" -}}
343-
{{- end -}}
344-
{{- end -}}
339+
345340

346341
{{/*
347342
Render the default standalone agentgateway proxy config template.
348343
*/}}
349344
{{- define "llm-d-router.proxy.agentgatewayConfig" -}}
350-
{{- $proxyValues := .Values.router.proxy | default dict -}}
351-
{{- $agentgateway := index $proxyValues "agentgateway" | default dict -}}
352-
{{- $service := index $agentgateway "service" | default dict -}}
353-
{{- $serviceName := index $service "name" | default "" -}}
354-
{{- $serviceNamespace := index $service "namespace" | default .Release.Namespace -}}
355-
{{- $servicePorts := splitList "," (include "llm-d-router.agentgateway.modelServicePorts" .) -}}
345+
{{- $serviceName := include "llm-d-router.agentgateway.logicalBackendName" . -}}
346+
{{- $serviceNamespace := .Release.Namespace -}}
347+
{{- $servicePorts := splitList "," (include "llm-d-router.standaloneEndpointTargetPorts" .) -}}
356348
{{- $backendPort := index $servicePorts 0 -}}
357349
{{- $listenerPort := include "llm-d-router.standaloneProxyListenerPort" . | int -}}
358350
config:

docs/architecture.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,3 +306,4 @@ Enable chunked decode via the pd-sidecar flag:
306306

307307
- [GIE Spec](../README.md#relation-to-gie-igw)
308308
- [Envoy External Processing](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter)
309+
- [EPP Container Sizing Guide](./operations.md)

0 commit comments

Comments
 (0)