Skip to content

Mitigate image-pushing jobs hitting GetRequestsPerMinutePerProject quota for prow build cluster project #20652

Open
@spiffxp

Description

@spiffxp

What happened:
Context: kubernetes/k8s.io#1576 (comment)

As the volume of image-pushing jobs running on the prow build cluster in k8s-infra-prow-build-trusted has grown, we're starting to bump into a GCB service quota (GetRequestsPerMinutePerProject) for the project. This isn't something we can request to raise like other quota (e.g. max gcp instances per region)

What you expected to happen:
Have GCB service requests charged to the project running the GCB builds instead of a central shared project. Avoid bumping into API-related quota.

How to reproduce it (as minimally and precisely as possible):
Merge a PR to kubernetes/kubernetes that updates multiple test/images subdirectories, or otherwise induce a high volume of image-pushing jobs on k8s-infra-prow-build-trusted

Ignore whether you bump into the concurrent builds quota (also a GCB service quota)

Can visualize usage (and whether quota is hit) here if a member of [email protected]: https://console.cloud.google.com/apis/api/cloudbuild.googleapis.com/quotas?orgonly=true&project=k8s-infra-prow-build-trusted&supportedpurview=project&pageState=(%22duration%22:(%22groupValue%22:%22P30D%22,%22customValue%22:null))

Please provide links to example occurrences, if any:
Don't have link to jobs that encountered this specifically, but kubernetes/k8s.io#1576 describes the issue, and the metric explorer link above shows roughly when we've bumped into quota.

Anything else we need to know?:
Parent issue: kubernetes/release#1869

My guess is that we need to move away from using a shared service account in the build cluster's project (gcb-builder@k8s-infra-prow-build-trusted), and instead setup service accounts per staging project.

It's unclear to me whether these would all need access to something in the build cluster project.

A service-account-per-project would add a bunch of boilerplate to the service accounts loaded into the build cluster, and add another field to job configs that needs to be set manually vs. copy-pasted. We could offset this by verifying configs are correct via presubmit enforcement.

I'm open to other suggestions to automate the boilerplate away, or a solution that involves image-builder consuming less API quota.

/milestone v1.21
/priority important-soon
/wg k8s-infra
/sig testing
/area images
/sig release
/area release-eng
/assign @cpanato @justaugustus
as owners of parent issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/imagesarea/release-engIssues or PRs related to the Release Engineering subprojectkind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/backlogHigher priority than priority/awaiting-more-evidence.sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.sig/releaseCategorizes an issue or PR as relevant to SIG Release.sig/testingCategorizes an issue or PR as relevant to SIG Testing.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions