diff --git a/config/charts/inferencepool/README.md b/config/charts/inferencepool/README.md index 013a1f2b3..5a6fa6d11 100644 --- a/config/charts/inferencepool/README.md +++ b/config/charts/inferencepool/README.md @@ -233,7 +233,7 @@ The following table list the configurable parameters of the chart. | `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. | | `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. | | `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. | -| `inferenceExtension.flags` | map of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. | +| `inferenceExtension.flags` | map of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. | | `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. | | `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. | | `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. | @@ -247,22 +247,23 @@ The following table list the configurable parameters of the chart. | `inferenceExtension.tracing.otelExporterEndpoint` | OpenTelemetry collector endpoint. | | `inferenceExtension.tracing.sampling.sampler` | The trace sampler to use. Currently, only `parentbased_traceidratio` is supported. This sampler respects the parent span’s sampling decision when present, and applies the configured ratio for root spans. | | `inferenceExtension.tracing.sampling.samplerArg` | Sampler-specific argument. For `parentbased_traceidratio`, this defines the base sampling rate for new traces (root spans), as a float string in the range [0.0, 1.0]. For example, "0.1" enables 10% sampling. | -| `inferenceExtension.volumes` | List of volumes to mount in the EPP deployment as free-form YAML. Optional. | -| `inferenceExtension.volumeMounts` | List of volume mounts for the EPP container as free-form YAML. Optional. | -| `inferenceExtension.sidecar.enabled` | Enables or disables the sidecar container in the EPP deployment. Defaults to `false`. | -| `inferenceExtension.sidecar.name` | Name of the sidecar container. Required when the sidecar is enabled. | -| `inferenceExtension.sidecar.image` | Image for the sidecar container. Required when the sidecar is enabled. | -| `inferenceExtension.sidecar.imagePullPolicy` | Image pull policy for the sidecar container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `IfNotPresent`. | -| `inferenceExtension.sidecar.command` | Command to run in the sidecar container as a single string. Optional. | -| `inferenceExtension.sidecar.args` | Arguments to pass to the command in the sidecar container as a list of strings. Optional. | -| `inferenceExtension.sidecar.env` | Environment variables to set in the sidecar container as free-form YAML. Optional. | -| `inferenceExtension.sidecar.ports` | List of ports to expose for the sidecar container. Optional. | -| `inferenceExtension.sidecar.livenessProbe` | Liveness probe configuration for the sidecar container. Optional. | -| `inferenceExtension.sidecar.readinessProbe` | Readiness probe configuration for the sidecar container. Optional. | -| `inferenceExtension.sidecar.resources` | Resource limits and requests for the sidecar container. Optional. | -| `inferenceExtension.sidecar.volumeMounts` | List of volume mounts for the sidecar container. Optional. | -| `inferenceExtension.sidecar.volumes` | List of volumes for the sidecar container. Optional. | -| `inferenceExtension.sidecar.configMapData` | Custom key-value pairs to be included in a ConfigMap created for the sidecar container. Only used when `inferenceExtension.sidecar.enabled` is `true`. Optional. | +| `inferenceExtension.volumes` | List of volumes to mount in the EPP deployment as free-form YAML. Optional. | +| `inferenceExtension.volumeMounts` | List of volume mounts for the EPP container as free-form YAML. Optional. | +| `inferenceExtension.sidecar.enabled` | Enables or disables the sidecar container in the EPP deployment. Defaults to `false`. | +| `inferenceExtension.sidecar.name` | Name of the sidecar container. Required when the sidecar is enabled. | +| `inferenceExtension.sidecar.image` | Image for the sidecar container. Required when the sidecar is enabled. | +| `inferenceExtension.sidecar.imagePullPolicy` | Image pull policy for the sidecar container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `IfNotPresent`. | +| `inferenceExtension.sidecar.command` | Command to run in the sidecar container as a single string. Optional. | +| `inferenceExtension.sidecar.args` | Arguments to pass to the command in the sidecar container as a list of strings. Optional. | +| `inferenceExtension.sidecar.env` | Environment variables to set in the sidecar container as free-form YAML. Optional. | +| `inferenceExtension.sidecar.ports` | List of ports to expose for the sidecar container. Optional. | +| `inferenceExtension.sidecar.livenessProbe` | Liveness probe configuration for the sidecar container. Optional. | +| `inferenceExtension.sidecar.readinessProbe` | Readiness probe configuration for the sidecar container. Optional. | +| `inferenceExtension.sidecar.resources` | Resource limits and requests for the sidecar container. Optional. | +| `inferenceExtension.sidecar.volumeMounts` | List of volume mounts for the sidecar container. Optional. | +| `inferenceExtension.sidecar.volumes` | List of volumes for the sidecar container. Optional. | +| `inferenceExtension.sidecar.configMapData` | Custom key-value pairs to be included in a ConfigMap created for the sidecar container. Only used when `inferenceExtension.sidecar.enabled` is `true`. Optional. | +| `inferenceObjectives` | A list of names and priorities to create InferenceObjectives from that will be assigned to the inference pool | | `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. | | `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. | diff --git a/config/charts/inferencepool/templates/inferenceobjectives.yaml b/config/charts/inferencepool/templates/inferenceobjectives.yaml new file mode 100644 index 000000000..1203435e9 --- /dev/null +++ b/config/charts/inferencepool/templates/inferenceobjectives.yaml @@ -0,0 +1,15 @@ +{{- range .Values.inferenceObjectives }} +--- +apiVersion: inference.networking.x-k8s.io/v1alpha2 +kind: InferenceObjective +metadata: + name: {{ .name }} + namespace: {{ $.Release.Namespace }} + labels: + {{- include "gateway-api-inference-extension.labels" $ | nindent 4 }} +spec: + priority: {{ .priority }} + poolRef: + group: {{ .Values.inferenceExtension.apiVersion }} + name: {{ .name }} +{{- end }} diff --git a/config/charts/inferencepool/values.yaml b/config/charts/inferencepool/values.yaml index aba6bbfda..b2b597236 100644 --- a/config/charts/inferencepool/values.yaml +++ b/config/charts/inferencepool/values.yaml @@ -158,15 +158,17 @@ inferencePool: targetPorts: - number: 8000 modelServerType: vllm # vllm, triton-tensorrt-llm - apiVersion: inference.networking.k8s.io/v1 + apiVersion: inference.networking.k8s.io/v1 # modelServers: # REQUIRED # matchLabels: # app: vllm-llama3-8b-instruct - # Should only used if apiVersion is inference.networking.x-k8s.io/v1alpha2, + # Should only used if apiVersion is inference.networking.x-k8s.io/v1alpha2, # This will soon be deprecated when upstream GW providers support v1, just doing something simple for now. targetPortNumber: 8000 + + # Options: ["gke", "istio", "none"] provider: name: none @@ -199,3 +201,13 @@ istio: # connectionPool: # http: # maxRequestsPerConnection: 256000 + + +# Optional: Define multiple InferenceObjectives for this InferencePool. +# Each InferenceObjective associates a name and priority with this InferencePool. +# Users reference these objectives by name in their request headers. +inferenceObjectives: [] +# - name: high-priority +# priority: 5 +# - name: low-priority +# priority: 1