Skip to content

Conversation

@SinaChavoshi
Copy link
Member

@SinaChavoshi SinaChavoshi commented Sep 25, 2025

This PR introduces support for the GKE Inference Gateway by adding a new feature flag to the gke-cluster module.
Key changes:

  • Added a new boolean variable, enable_inference_gateway, to the gke-cluster module. When set to true, this flag:

    • Enables the HttpLoadBalancing add-on in the GKE cluster, a prerequisite for the Gateway API.
    • Deploys the necessary Inference Gateway Custom Resource Definitions (CRDs) directly from the official Kubernetes SIGs repository.
  • Created a new example blueprint, gke-a3-highgpu-inference-gateway.yaml, to demonstrate how to enable this feature. This blueprint also includes the required REGIONAL_MANAGED_PROXY
    subnet.

  • Updated the examples/README.md to include documentation for the new blueprint, guiding users on how to deploy a sample workload after the cluster is provisioned.

Submission Checklist

Please take the following actions before submitting this pull request.

Fork your PR branch from the Toolkit "develop" branch (not main)

  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@SinaChavoshi SinaChavoshi requested review from a team and samskillman as code owners September 25, 2025 23:35
@SinaChavoshi SinaChavoshi changed the base branch from main to develop September 25, 2025 23:35
@samskillman
Copy link
Collaborator

Hi @SinaChavoshi - would you mind rebasing your changes on top of the current upstream develop branch, and make the target of this PR to go to the develop branch as well? That follows our development pattern. Thanks!

@SinaChavoshi SinaChavoshi force-pushed the gke-cluster-inference-gateway branch from abde98d to 0455bdf Compare September 26, 2025 21:21
@SinaChavoshi
Copy link
Member Author

Hi @SinaChavoshi - would you mind rebasing your changes on top of the current upstream develop branch, and make the target of this PR to go to the develop branch as well? That follows our development pattern. Thanks!

Done.

@samskillman samskillman added the release-key-new-features Added to release notes under the "Key New Features" heading. label Sep 30, 2025
Copy link
Collaborator

@samskillman samskillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly it looks good, thank you for this contribution! I've added a few suggestions that we should discuss/fix up before merging.

Copy link
Member

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the delay reviewing this.

The PR appears to be well-implemented and achieves its goal. The module changes are correct, the example blueprint is functional, and the documentation is clear.

You are however missing an entry in the examples/README.md TOC (lines 18-72). Please add that.

Nit: Consider adding a note in the modules/scheduler/gke-cluster/README.md about the new enable_inference_gateway variable. This note should mention the requirement of having a subnet with purpose: "REGIONAL_MANAGED_PROXY" in the VPC for this feature to work (or point to the relevant networking documentation).

@cboneti cboneti assigned SinaChavoshi and unassigned cboneti Oct 14, 2025
@SinaChavoshi
Copy link
Member Author

... You are however missing an entry in the examples/README.md TOC (lines 18-72). Please add that.

Nit: Consider adding a note in the modules/scheduler/gke-cluster/README.md about the new enable_inference_gateway variable. This note should mention the requirement of having a subnet with purpose: "REGIONAL_MANAGED_PROXY" in the VPC for this feature to work (or point to the relevant networking documentation).

Good catch! Thank you for the feedback. I updated the PR to address both issues raised.

@SinaChavoshi SinaChavoshi requested a review from cboneti October 15, 2025 19:08
cboneti
cboneti previously approved these changes Oct 15, 2025
Copy link
Member

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@cboneti cboneti assigned samskillman and unassigned SinaChavoshi Oct 15, 2025
samskillman
samskillman previously approved these changes Oct 15, 2025
Copy link
Collaborator

@samskillman samskillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving - let's make sure we run the relevant tests on it.

add read me

fix for loop

fix http load balancing

install crd from http

change logic to only set value when flag is set

test pre-commit

verify precomit

remove extra white space

remove hard copy of the manifest

fix pre-commit

fix secondary ip range

fix sub network details

fix the subnet mask overlap issue

remove secondary range from proxy only subnetwork this is not needed here.

fix the subnetwork config

add subnet_ip

enable inference gateway in the cluster

fix gateway installation

fix gateway_api_config setting

move gateway_api_config under networking config

fix pre-commit fails.

remove extra line in readme

add a default cpu nodepool

switch to use 192.168.0.0/16

update based on commetns ( use atuo scaling and remove jobset)

fix reservation type

add explict values for reservation adn set spot to false.

add comments to show how to use spot vm and reservations

update read me based on review feedback

remove extra varialbe introduced by accident during merge.

remove extra bracked from read me file.
@SinaChavoshi SinaChavoshi force-pushed the gke-cluster-inference-gateway branch from 94c7f32 to 47fe953 Compare October 15, 2025 22:18
samskillman
samskillman previously approved these changes Oct 15, 2025
cboneti
cboneti previously approved these changes Oct 16, 2025
@samskillman
Copy link
Collaborator

/gcbrun

@kadupoornima
Copy link
Contributor

/gcbrun

Copy link
Member

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, pending passing all relevant tests.

@SinaChavoshi
Copy link
Member Author

Thank you so much for reviews, I noticed that the failure PR-test-gke-a3-highgpu (hpc-toolkit-dev) seem to have been failing on all executions since mid Aug, is that a correct understanding ? is there a recomended way for me to proceed to unblock this PR?

@cboneti cboneti enabled auto-merge November 6, 2025 09:20
@cboneti
Copy link
Member

cboneti commented Nov 6, 2025

Thank you so much for reviews, I noticed that the failure PR-test-gke-a3-highgpu (hpc-toolkit-dev) seem to have been failing on all executions since mid Aug, is that a correct understanding ? is there a recomended way for me to proceed to unblock this PR?

Yes, I think we will ignore that and merge this shortly.

@cboneti cboneti merged commit 5b7b8fa into GoogleCloudPlatform:develop Nov 11, 2025
23 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-key-new-features Added to release notes under the "Key New Features" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants