Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn down janitor jobs / boskos deployments ahead of default cluster turndown #33129

Closed
BenTheElder opened this issue Jul 26, 2024 · 16 comments · Fixed by #33260
Closed

Turn down janitor jobs / boskos deployments ahead of default cluster turndown #33129

BenTheElder opened this issue Jul 26, 2024 · 16 comments · Fixed by #33260
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@BenTheElder
Copy link
Member

We should plan to turn these down a bit before actually deactivating the default cluster

  1. stop jobs from accessing boskos resources
  2. allow cleanup to run
  3. turn down entirely
  4. turn down default culster

for 1) I think we can do this ~August 1st, but it will take a little planning, as what we really want to do is just stop renting clean projects and allow everything to get cleaned ...

cc @dims @upodroid @ameukam

Need to think about how we can best accomplish that. Ideally even as we cut off the legacy resources we let the janitor jobs / boskos do their jobs one last time.

We could manually run cleanup, but ... we haven't done that in a long time and I'd rather not have to figure that out with auth for vsphere, azure, AWS, GCP ...

@BenTheElder BenTheElder added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jul 26, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 26, 2024
@BenTheElder
Copy link
Member Author

/sig testing k8s-infra

@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 26, 2024
@ameukam
Copy link
Member

ameukam commented Jul 26, 2024

For (1), could be:

Alternatively, we could allow projects renting but disable GCP services in each services.

@BenTheElder
Copy link
Member Author

We have at least AWS and VSphere in addition to GCP janitors.

Ramping down the pool doesn't let the cleanup run?

otherwise if we depend only on janitor jobs and not the reaper we could just disable boskos service endpoint

@ameukam
Copy link
Member

ameukam commented Jul 29, 2024

AWS janitor currently runs against a CNCF account used by kOps. It should be fine to copy over those creds to the k8s-infra-prow-build-trusted cluster until we fix kubernetes/k8s.io#5127

@BenTheElder
Copy link
Member Author

After sleeping on this ... I think we should just drop all of the non-janitor jobs and leave boskos/janitor deployed for another 24h.

So the list becomes:

  1. delete all non-janitor default-cluster jobs
  2. wait some period (24h? 12?)
  3. turn down janitor, boskos, default cluster (or revert if there's some good reason)
  4. ready for prow migration

@BenTheElder
Copy link
Member Author

Though until kubernetes/k8s.io#5127 we probably do need to migrate that specific janitor then ...

@ameukam
Copy link
Member

ameukam commented Jul 29, 2024

+1 on moving AWS janitor and dropping (disabling is a better word?) the non-janitor jobs.

@BenTheElder BenTheElder changed the title Turn down janito jobs / boskos deployments ahead of default cluster turndown Turn down janitor jobs / boskos deployments ahead of default cluster turndown Jul 29, 2024
@BenTheElder
Copy link
Member Author

#33229 removes some of the last GCE jobs, will remove the GCP janitors a bit after we're done removing GCP jobs, following #33226 / https://groups.google.com/a/kubernetes.io/g/dev/c/qzNYpcN5la4

@BenTheElder
Copy link
Member Author

https://monitoring.prow.k8s.io/d/wSrfvNxWz/boskos-resource-usage?orgId=1

We have a wedged ingress project, but otherwise we're close to being able to turn down the GCP janitors and cluster: default boskos instance.

@BenTheElder
Copy link
Member Author

#33234 removes the legacy GCP janitors.

Need to follow up even on our current projects re: why are some of them in "other" state in boskos ... https://kubernetes.slack.com/archives/CCK68P2Q2/p1722900397667809

@BenTheElder
Copy link
Member Author

#33241 will unblock dropping, or outright drop the vsphere janitors.

that leaves the AWS janitors to consider @dims, I think the issue is we still have some old CNCF AWS usage and not just the k8s infra stuff covered by the migrated janitors? not confident on this.

maintenance-ci-aws-janitor is still running in cluster: default.

@BenTheElder
Copy link
Member Author

AFAICT, the legacy boskos only ever had GCP, so I can start looking at turning that down.

https://github.com/kubernetes/test-infra/blob/d72c3fcc5617fc61d02555658bc954dd24d6e6a6/config/prow/cluster/build/boskos-resources/boskos-resources.yaml

@BenTheElder
Copy link
Member Author

ameukam added a commit to ameukam/test-infra that referenced this issue Aug 7, 2024
Ref:
  - kubernetes#33129

`maintenance-ci-aws-janitor` is running against a historical AWS account
currently used by the kOps project for CI purposes. It's unclear if
other projects used this account but we should move it outside of
Google-owned build clusters to unblock the Prow migration.
AFAIK `kops-infra-prow-kops-build` already has the credentials in place to run
it.
@BenTheElder
Copy link
Member Author

#33246 removed the vsphere janitor.
#33254 migrated the AWS janitor, so now it's on to turning down boskos.

@BenTheElder
Copy link
Member Author

#33260 to turn down boskos

@BenTheElder
Copy link
Member Author

This is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants