Skip to content

gcloud: Fallback zone support (WIP)#4536

Draft
angelcerveraroldan wants to merge 2 commits intocoreos:mainfrom
angelcerveraroldan:gcp-fallback-zone
Draft

gcloud: Fallback zone support (WIP)#4536
angelcerveraroldan wants to merge 2 commits intocoreos:mainfrom
angelcerveraroldan:gcp-fallback-zone

Conversation

@angelcerveraroldan
Copy link
Copy Markdown
Member

We seem to be hitting a lot of zone issues with GCP. It may be worth allowing the zones to change automatically when there are errors, so that we dont have to keep manually changing the zone in the pipeline.

Note: This is not yet tested / finished.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 20, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to discover available GCP zones within a region for a specific machine type, intended to support fallback when resources are depleted. It updates the API struct to store a list of available zones and modifies the kola CLI help text to reflect this new behavior. Review feedback notes that while the zones are now discovered and stored, they are not yet utilized in the instance creation logic. Additionally, it is recommended to move the regular expression compilation for zone parsing to the package level to improve performance during zone discovery.

Comment thread mantle/platform/api/gcloud/api.go
Comment thread mantle/platform/api/gcloud/api.go
@angelcerveraroldan
Copy link
Copy Markdown
Member Author

/test all

@dustymabe dustymabe requested a review from marmijo April 21, 2026 17:03
Copy link
Copy Markdown
Contributor

@prestist prestist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good :) I know its wip but here are some quick reviews!

Comment thread mantle/platform/api/gcloud/api.go Outdated
}

zones := []string{}
for _, scopedList := range list.Items {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact, go maps do not specifically iterate in order. So in a way this is kinda a cool outcome due to load spreading across the list of zones. Though if we want order we could sort it after collection of zones.

Comment thread mantle/platform/api/gcloud/compute.go Outdated
return nil, err
}
plog.Debugf("Creating instance %q in zone %s", name, zone)
op, err := a.compute.Instances.Insert(a.options.Project, zone, inst).Do()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, this has potential to leave old bits still opened; I dont know how good garbage collection is but if we fail we should probably should terminate failed instances?

Comment thread mantle/platform/machine/gcloud/cluster.go Outdated
Comment thread mantle/platform/api/gcloud/compute.go
Comment thread mantle/cmd/kola/options.go Outdated
@angelcerveraroldan angelcerveraroldan added the jira for syncing to jira label Apr 27, 2026
@angelcerveraroldan angelcerveraroldan force-pushed the gcp-fallback-zone branch 3 times, most recently from abe7397 to 94b708d Compare April 27, 2026 14:45
When we fail to connect to a machine with an error that indicates that
the zone we are connecting to is not functioning correctly, fallback
to a different zone and try again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants