-
Notifications
You must be signed in to change notification settings - Fork 207
Add optional inference objective #1995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add optional inference objective #1995
Conversation
Signed-off-by: greg pereira <[email protected]>
Signed-off-by: greg pereira <[email protected]>
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Gregory-Pereira The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @Gregory-Pereira. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
This PR looks ok, but somehow I think it's missing something. It is creating a single InferenceObjective with a name that matches the Helm Release Name. As I understand things the InferenceObjective is referenced by the header x-gateway-inference-objective sent with the request. This is a request related thing. I would expect the ability to create several InferenceObjectives each with a different name and different priority. |
|
Good point, I will update the implementation so that users could define all the inference objectives they wish to relate to the inference pool |
Signed-off-by: greg pereira <[email protected]>
|
/ok-to-test |
Signed-off-by: greg pereira <[email protected]>
ca76a99 to
ff97818
Compare
|
Can you please discuss the motivation for this? I see some value, but infObj are a resource that will be created/updated/deleted after creating the infPool; meaning likely new objectives will be added/deleted later. |
e000098 to
6751dd6
Compare
… over inference objectives Signed-off-by: greg pereira <[email protected]>
6751dd6 to
db76251
Compare
I saw the value as automating the creation / deletion of them. In this way they get created and cleaned up with the helm chart. Not to say that others cannot add more out of band. I started on this in preparation for the Flow Control integration work with regard to an LLM-D guide that could showcase the work. |
| priority: {{ .priority }} | ||
| poolRef: | ||
| group: {{ .Values.inferenceExtension.apiVersion }} | ||
| name: {{ .name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another miss?
| name: {{ .name }} | |
| name: {{ .Release.Name }} |
|
ok, I can see value in cases where for the most part the objectives are known in advance and mostly static |
| kind: InferenceObjective | ||
| metadata: | ||
| name: {{ .name }} | ||
| namespace: {{ $.Release.Namespace }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC - why do we need here the $?
shouldn't we use
| namespace: {{ $.Release.Namespace }} | |
| namespace: {{ .Release.Namespace }} |
?
| | `inferenceExtension.sidecar.volumeMounts` | List of volume mounts for the sidecar container. Optional. | | ||
| | `inferenceExtension.sidecar.volumes` | List of volumes for the sidecar container. Optional. | | ||
| | `inferenceExtension.sidecar.configMapData` | Custom key-value pairs to be included in a ConfigMap created for the sidecar container. Only used when `inferenceExtension.sidecar.enabled` is `true`. Optional. | | ||
| | `inferenceObjectives` | A list of names and priorities to create InferenceObjectives from that will be assigned to the inference pool | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend documenting that this is for the case where the objectives are known in advance and mostly static, and that the user can still add/update/delete objectives later.
| # maxRequestsPerConnection: 256000 | ||
|
|
||
|
|
||
| # Optional: Define multiple InferenceObjectives for this InferencePool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1995/changes#r2620118777 would apply here also
|
Agreed with the other comments here. As long as we communicate clearly that there isn't a need to correlate the infObjectives at Pool creation, this all seems reasonable to me |
What type of PR is this?
/kind cleanup
/kind feature
What this PR does / why we need it:
Enable utilization of the InferenceObjective CR we already have
Does this PR introduce a user-facing change?:
NONE, simply exposes the inferencepool objective in the helm charts