Skip to content

Document CI cluster selection, CPU : RAM ratio / machine types, and general recommendations specific to prow.k8s.io #34139

Open
@BenTheElder

Description

@BenTheElder

What would you like to be added:

We don't have a single place to point to regarding which cluster: you should use and why, and how much resources to use (and how to avoid pointlessly scheduling minuscule amounts of memory per CPU core, which ultimately costs us more when workloads prefer more CPU time to allocating and memory sits unused).

We should do this per-cluster and create a doc somewhere discoverable, perhaps under config/jobs.

We should also consider adding details like:

  • kubekins / CI image recommendations
    • docker in docker
  • Additional pointers to the hacks we have employed in the clusters (like pre-allocating loop devices, tuning sysctls ...).

Why is this needed:

So contributors can understand the Kubernetes specific CI environment and how to effectively schedule to it / write prow.k8s.io specific jobs.

/sig testing k8s-infra
@kubernetes/sig-k8s-infra-leads @kubernetes/sig-testing-leads


These are really not discoverable:

https://github.com/kubernetes/k8s.io/blob/86089ae44dd87d86fa1a2a651bb0d6f4ceb06270/infra/aws/terraform/prow-build-cluster/terraform.prod.tfvars#L39C32-L39C44

https://github.com/kubernetes/k8s.io/blob/86089ae44dd87d86fa1a2a651bb0d6f4ceb06270/infra/gcp/terraform/k8s-infra-prow-build/main.tf#L101)

Along with "what is the trusted cluster" etc.

We should also deprecate out the eks-job-migration doc and associated job report results, and we should consider how to balance scheduling to EKS/GKE more generally now that the budgets are similar and all the workloads are running in community accounts. (And also how to approach Azure with the much smaller budget ...)

Activity

added
sig/testingCategorizes an issue or PR as relevant to SIG Testing.
sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.
on Jan 13, 2025
BenTheElder

BenTheElder commented on Jan 15, 2025

@BenTheElder
MemberAuthor

We should also link out to https://monitoring-eks.prow.k8s.io/?orgId=1 https://monitoring-gke.prow.k8s.io/?orgId=1 for checking actual resource usage

BenTheElder

BenTheElder commented on Feb 26, 2025

@BenTheElder
MemberAuthor

@kubernetes/sig-k8s-infra-leads how far are we from being able to tell CI users about the CI cluster machine shapes?

BenTheElder

BenTheElder commented on Feb 26, 2025

@BenTheElder
MemberAuthor

I know we wanted to reconsider the EKS machine types, and also on GCP it's not clear if highmem actually makes sense with our current workloads, though any changes would have to be done carefully to avoid breaking jobs that already implicitly depend on the machine sizes.

ameukam

ameukam commented on Feb 26, 2025

@ameukam
Member

@kubernetes/sig-k8s-infra-leads how far are we from being able to tell CI users about the CI cluster machine shapes?

I think once we are done with instance type selection. We can also communicate what we already have and later update the docs once the instances are switched.

BenTheElder

BenTheElder commented on Mar 26, 2025

@BenTheElder
MemberAuthor

@xmudrii @upodroid ?

This is coming up all the time, we should decide and either commit to documenting the EKS cluster as-is, or move on changing it.

It's really difficult to answer these questions about resources available, which cluster to use, etc. currently.

xmudrii

xmudrii commented on Mar 26, 2025

@xmudrii
Member

or move on changing it.

I'll push on that right after KubeCon, too busy at the moment. 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.sig/testingCategorizes an issue or PR as relevant to SIG Testing.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @BenTheElder@ameukam@xmudrii@k8s-ci-robot

      Issue actions

        Document CI cluster selection, CPU : RAM ratio / machine types, and general recommendations specific to prow.k8s.io · Issue #34139 · kubernetes/test-infra