You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ A lightweight deployment where a self-managed Envoy proxy runs alongside the EPP
39
39
### 2. Gateway Mode (Inference Gateway)
40
40
The recommended mode for production environments, leveraging the official [Gateway API]. In this mode, the EPP acts as a backend for an `InferencePool`, which is referenced by an `HTTPRoute` on a shared `Gateway`. This enables advanced traffic management, multi-cluster load balancing, and shared infrastructure for both inference and traditional workloads.
41
41
42
-
For more details on the router architecture, routing logic, and different plugins (filters and scorers), see the [Architecture Documentation].
42
+
For more details on the router architecture, routing logic, and different plugins (filters and scorers), see the [Architecture Documentation]. For resource provisioning and container sizing recommendations under heavy or long-context workloads, see the [EPP Container Sizing Guide].
43
43
44
44
---
45
45
@@ -61,6 +61,7 @@ To ensure clarity across the project, we use the following standard terminology:
#### Standalone with Agentgateway Proxy (Service-Backed)
36
-
Deploys EPP with an Agentgateway proxy. This mode requires disabling the `InferencePool` resource creation (`create=false`) and routes traffic to an existing Kubernetes Service:
35
+
#### Standalone with Agentgateway Proxy
36
+
Deploys EPP with an Agentgateway proxy. This mode requires disabling the `InferencePool` resource creation (`create=false`) and routes traffic directly to model servers:
| `router.proxy.configMapData` | Key-value pairs to include in a ConfigMap created for the sidecar. | `{}` |
541
-
| `router.proxy.agentgateway.service.create` | **Agentgateway only**. Create a dedicated model Service for the Agentgateway proxy. | `true` |
542
-
| `router.proxy.agentgateway.service.name` | **Agentgateway only**. Name of the model Service to route to. | `""` |
543
-
| `router.proxy.agentgateway.service.namespace` | **Agentgateway only**. Namespace of the model Service. Defaults to release namespace. | `""` |
544
-
| `router.proxy.agentgateway.service.ports` | **Agentgateway only**. Port list for the model Service (must match `modelServers.targetPorts`). | `[]` |
540
+
#### Complete Standalone Example with Agentgateway Proxy
545
541
546
-
#### Complete Proxy Sidecar Example (Agentgateway Service-Backed)
547
-
548
-
To deploy EPP in standalone mode with an Agentgateway sidecar routing traffic directly to an existing model Service `my-model-service` (bypassing `InferencePool` creation):
542
+
To deploy EPP in standalone mode with an Agentgateway sidecar routing traffic directly to model servers matching the label `app=my-model-service` (bypassing `InferencePool` creation):
549
543
550
544
```yaml
551
545
router:
552
546
inferencePool:
553
547
create: false # Disable InferencePool creation
554
548
549
+
modelServers:
550
+
matchLabels:
551
+
app: "my-model-service"
552
+
targetPorts:
553
+
- number: 8000
554
+
555
555
proxy:
556
556
enabled: true
557
557
proxyType: agentgateway
@@ -561,10 +561,4 @@ router:
561
561
memory: 4Gi
562
562
limits:
563
563
memory: 8Gi
564
-
agentgateway:
565
-
service:
566
-
create: true # Create a Service to route client traffic to EPP
{{- fail (printf".Values.router.proxy.mode must be one of [sidecar, service], got %q"$proxyMode) -}}
20
20
{{- end -}}
21
+
{{- /* Without an InferencePool the EPP --endpoint-selector is rendered from modelServers.matchLabels; an empty selector is rejected by EPP at startup, so require it here. */ -}}
{{- fail ".Values.router.modelServers.matchLabels is required when .Values.router.inferencePool.create=false: standalone mode renders the EPP --endpoint-selector from matchLabels and cannot start with an empty selector" -}}
{{- fail (printf".Values.router.proxy.failOpen must be a boolean, got %q" (toString $failOpen)) -}}
@@ -46,32 +53,17 @@ standalone validations
46
53
{{- fail (printf".Values.router.proxy.proxyType must be one of [envoy, agentgateway], got %q"$proxyType) -}}
47
54
{{- end -}}
48
55
{{- ifeq$proxyType"agentgateway" -}}
56
+
{{- if hasKey $proxy"agentgateway" -}}
57
+
{{- fail ".Values.router.proxy.agentgateway is no longer supported; standalone agentgateway uses EPP endpoint discovery with a logical service backend" -}}
{{- fail ".Values.router.inferencePool.create=false is required when proxyType=agentgateway; standalone agentgateway currently supports only service-backed routing" -}}
{{- fail ".Values.router.proxy.agentgateway.service.port has been replaced by .Values.router.proxy.agentgateway.service.ports" -}}
58
-
{{- end -}}
59
-
{{- if empty $serviceName -}}
60
-
{{- fail ".Values.router.proxy.agentgateway.service.name is required when proxyType=agentgateway" -}}
61
-
{{- end -}}
62
-
{{- $targetPorts:= include "llm-d-router.standaloneEndpointTargetPorts". -}}
63
-
{{- $servicePorts:= include "llm-d-router.agentgateway.modelServicePorts". -}}
64
-
{{- ifne$targetPorts$servicePorts -}}
65
-
{{- fail (printf".Values.router.proxy.agentgateway.service.ports must match .Values.router.modelServers.targetPorts when proxyType=agentgateway, got service ports %q and target ports %q"$servicePorts$targetPorts) -}}
66
-
{{- end -}}
67
62
{{- $listenerPort:= include "llm-d-router.standaloneProxyListenerPort". -}}
{{- fail ".Values.router.epp.flags.secure-serving must be false when proxyType=agentgateway; standalone agentgateway uses plaintext gRPC to EPP over localhost" -}}
71
66
{{- end -}}
72
-
{{- if$serviceCreate -}}
73
-
{{- $selectorLabels:= include "llm-d-router.agentgateway.modelServiceSelectorLabels". -}}
0 commit comments