-
Notifications
You must be signed in to change notification settings - Fork 208
Add optional inference objective #1995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
857ed6b
d876b40
17a159c
ff97818
db76251
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,15 @@ | ||||||
| {{- range .Values.inferenceObjectives }} | ||||||
| --- | ||||||
| apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||||||
| kind: InferenceObjective | ||||||
| metadata: | ||||||
| name: {{ .name }} | ||||||
| namespace: {{ $.Release.Namespace }} | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OOC - why do we need here the namespace: {{ .Release.Namespace }}?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh actually forget about that. |
||||||
| labels: | ||||||
| {{- include "gateway-api-inference-extension.labels" $ | nindent 4 }} | ||||||
| spec: | ||||||
| priority: {{ .priority }} | ||||||
| poolRef: | ||||||
| group: {{ .Values.inferenceExtension.apiVersion }} | ||||||
| name: {{ .name }} | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. another miss?
Suggested change
|
||||||
| {{- end }} | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -158,15 +158,17 @@ inferencePool: | |
| targetPorts: | ||
| - number: 8000 | ||
| modelServerType: vllm # vllm, triton-tensorrt-llm | ||
| apiVersion: inference.networking.k8s.io/v1 | ||
| apiVersion: inference.networking.k8s.io/v1 | ||
| # modelServers: # REQUIRED | ||
| # matchLabels: | ||
| # app: vllm-llama3-8b-instruct | ||
|
|
||
| # Should only used if apiVersion is inference.networking.x-k8s.io/v1alpha2, | ||
| # Should only used if apiVersion is inference.networking.x-k8s.io/v1alpha2, | ||
| # This will soon be deprecated when upstream GW providers support v1, just doing something simple for now. | ||
| targetPortNumber: 8000 | ||
|
|
||
|
|
||
|
|
||
| # Options: ["gke", "istio", "none"] | ||
| provider: | ||
| name: none | ||
|
|
@@ -199,3 +201,13 @@ istio: | |
| # connectionPool: | ||
| # http: | ||
| # maxRequestsPerConnection: 256000 | ||
|
|
||
|
|
||
| # Optional: Define multiple InferenceObjectives for this InferencePool. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1995/changes#r2620118777 would apply here also |
||
| # Each InferenceObjective associates a name and priority with this InferencePool. | ||
| # Users reference these objectives by name in their request headers. | ||
| inferenceObjectives: [] | ||
| # - name: high-priority | ||
| # priority: 5 | ||
| # - name: low-priority | ||
| # priority: 1 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend documenting that this is for the case where the objectives are known in advance and mostly static, and that the user can still add/update/delete objectives later.