-
Notifications
You must be signed in to change notification settings - Fork 269
feat: Add GKE Inference Gateway support #4699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GKE Inference Gateway support #4699
Conversation
|
Hi @SinaChavoshi - would you mind rebasing your changes on top of the current upstream |
abde98d to
0455bdf
Compare
Done. |
samskillman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly it looks good, thank you for this contribution! I've added a few suggestions that we should discuss/fix up before merging.
cboneti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for the delay reviewing this.
The PR appears to be well-implemented and achieves its goal. The module changes are correct, the example blueprint is functional, and the documentation is clear.
You are however missing an entry in the examples/README.md TOC (lines 18-72). Please add that.
Nit: Consider adding a note in the modules/scheduler/gke-cluster/README.md about the new enable_inference_gateway variable. This note should mention the requirement of having a subnet with purpose: "REGIONAL_MANAGED_PROXY" in the VPC for this feature to work (or point to the relevant networking documentation).
Good catch! Thank you for the feedback. I updated the PR to address both issues raised. |
cboneti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
samskillman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving - let's make sure we run the relevant tests on it.
6776d01
add read me fix for loop fix http load balancing install crd from http change logic to only set value when flag is set test pre-commit verify precomit remove extra white space remove hard copy of the manifest fix pre-commit fix secondary ip range fix sub network details fix the subnet mask overlap issue remove secondary range from proxy only subnetwork this is not needed here. fix the subnetwork config add subnet_ip enable inference gateway in the cluster fix gateway installation fix gateway_api_config setting move gateway_api_config under networking config fix pre-commit fails. remove extra line in readme add a default cpu nodepool switch to use 192.168.0.0/16 update based on commetns ( use atuo scaling and remove jobset) fix reservation type add explict values for reservation adn set spot to false. add comments to show how to use spot vm and reservations update read me based on review feedback remove extra varialbe introduced by accident during merge. remove extra bracked from read me file.
94c7f32 to
47fe953
Compare
|
/gcbrun |
835bb97
|
/gcbrun |
cboneti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, pending passing all relevant tests.
|
Thank you so much for reviews, I noticed that the failure PR-test-gke-a3-highgpu (hpc-toolkit-dev) seem to have been failing on all executions since mid Aug, is that a correct understanding ? is there a recomended way for me to proceed to unblock this PR? |
Yes, I think we will ignore that and merge this shortly. |
This PR introduces support for the GKE Inference Gateway by adding a new feature flag to the
gke-clustermodule.Key changes:
Added a new boolean variable,
enable_inference_gateway, to thegke-clustermodule. When set totrue, this flag:HttpLoadBalancingadd-on in the GKE cluster, a prerequisite for the Gateway API.Created a new example blueprint,
gke-a3-highgpu-inference-gateway.yaml, to demonstrate how to enable this feature. This blueprint also includes the requiredREGIONAL_MANAGED_PROXYsubnet.
Updated the
examples/README.mdto include documentation for the new blueprint, guiding users on how to deploy a sample workload after the cluster is provisioned.Submission Checklist
Please take the following actions before submitting this pull request.
Fork your PR branch from the Toolkit "develop" branch (not main)