Skip to content

Commit ff97818

Browse files
move inferenceObjective to top level and cleanup template
Signed-off-by: greg pereira <[email protected]>
1 parent 17a159c commit ff97818

File tree

3 files changed

+13
-15
lines changed

3 files changed

+13
-15
lines changed

config/charts/inferencepool/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,6 @@ The following table list the configurable parameters of the chart.
225225
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
226226
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
227227
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
228-
| `inferencePool.priority` | A priority that will be applied to the inferencepool through an inferenceobjective. |
229228
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
230229
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
231230
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
@@ -264,6 +263,7 @@ The following table list the configurable parameters of the chart.
264263
| `inferenceExtension.sidecar.volumeMounts` | List of volume mounts for the sidecar container. Optional. |
265264
| `inferenceExtension.sidecar.volumes` | List of volumes for the sidecar container. Optional. |
266265
| `inferenceExtension.sidecar.configMapData` | Custom key-value pairs to be included in a ConfigMap created for the sidecar container. Only used when `inferenceExtension.sidecar.enabled` is `true`. Optional. |
266+
| `inferenceObjectives` | A list of names and priorities to create InferenceObjectives from that will be assigned to the inference pool |
267267
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
268268
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
269269

config/charts/inferencepool/templates/inferenceobjective.yaml renamed to config/charts/inferencepool/templates/inferenceobjectives.yaml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,4 @@
1-
{{- range .Values.inferencePool.inferenceObjectives }}
2-
{{- $group := "inference.networking.k8s.io" -}}
3-
{{- if eq $.Values.inferencePool.apiVersion "inference.networking.x-k8s.io/v1alpha2" -}}
4-
{{- $group = "inference.networking.x-k8s.io" -}}
5-
{{- end -}}
1+
{{- range .Values.inferenceObjectives }}
62
---
73
apiVersion: inference.networking.x-k8s.io/v1alpha2
84
kind: InferenceObjective

config/charts/inferencepool/values.yaml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -167,15 +167,7 @@ inferencePool:
167167
# This will soon be deprecated when upstream GW providers support v1, just doing something simple for now.
168168
targetPortNumber: 8000
169169

170-
# Optional: Define multiple InferenceObjectives for this InferencePool.
171-
# Each InferenceObjective associates a name and priority with this InferencePool.
172-
# Users reference these objectives by name in their request headers.
173-
# inferenceObjectives:
174-
# - name: high-priority
175-
# priority: 1
176-
# - name: low-priority
177-
# priority: 5
178-
inferenceObjectives: []
170+
179171

180172
# Options: ["gke", "istio", "none"]
181173
provider:
@@ -209,3 +201,13 @@ istio:
209201
# connectionPool:
210202
# http:
211203
# maxRequestsPerConnection: 256000
204+
205+
206+
# Optional: Define multiple InferenceObjectives for this InferencePool.
207+
# Each InferenceObjective associates a name and priority with this InferencePool.
208+
# Users reference these objectives by name in their request headers.
209+
inferenceObjectives: []
210+
# - name: high-priority
211+
# priority: 1
212+
# - name: low-priority
213+
# priority: 5

0 commit comments

Comments
 (0)