Skip to content

[CI] Attach pod disruption budgets to runner pods#523

Merged
boomanaiden154 merged 4 commits intollvm:mainfrom
boomanaiden154:gke-pdbs
Jul 24, 2025
Merged

[CI] Attach pod disruption budgets to runner pods#523
boomanaiden154 merged 4 commits intollvm:mainfrom
boomanaiden154:gke-pdbs

Conversation

@boomanaiden154
Copy link
Contributor

This patch adds some pod disruption budgets to runner pods that just sets the minimum number of available pods to the maximum. This ensure that the number of pods that k8s calculates can be disrupted is zero. This means that when GKE is updating the node pool, it must wait an hour before forcibly evicting the pod, giving it time to finish. Before this, when GKE wanted to upgrade a node, it would forcibly evict the pod very quickly (theoretically after the grace period which has a default of 30s) not realizing it is stateful.

This patch adds some pod disruption budgets to runner pods that just sets the
minimum number of available pods to the maximum. This ensure that the number
of pods that k8s calculates can be disrupted is zero. This means that when
GKE is updating the node pool, it must wait an hour before forcibly evicting
the pod, giving it time to finish. Before this, when GKE wanted to upgrade a
node, it would forcibly evict the pod very quickly (theoretically after the
grace period which has a default of 30s) not realizing it is stateful.
@@ -0,0 +1,10 @@
apiVersion: policy/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more descriptive file name than "pdb.yaml".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to pod-disruption-budget.yaml.

@boomanaiden154 boomanaiden154 requested a review from cmtice July 24, 2025 22:38
tatus Outdated
@@ -0,0 +1,59 @@
diff --git a/premerge/pdb.yaml b/premerge/pdb.yaml
Copy link
Contributor

@cmtice cmtice Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a typo in this file name ("tatus")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file shouldn't exist. Looks like I was typing in the wrong spot when editing the commit message in vim and some version of the commit ended up in a file...

@boomanaiden154 boomanaiden154 requested a review from cmtice July 24, 2025 22:54
@boomanaiden154 boomanaiden154 merged commit be28b89 into llvm:main Jul 24, 2025
3 checks passed
@boomanaiden154 boomanaiden154 deleted the gke-pdbs branch July 24, 2025 23:00
vvereschaka pushed a commit to vvereschaka/llvm-zorg that referenced this pull request Sep 25, 2025
This patch adds some pod disruption budgets to runner pods that just
sets the minimum number of available pods to the maximum. This ensure
that the number of pods that k8s calculates can be disrupted is zero.
This means that when GKE is updating the node pool, it must wait an hour
before forcibly evicting the pod, giving it time to finish. Before this,
when GKE wanted to upgrade a node, it would forcibly evict the pod very
quickly (theoretically after the grace period which has a default of
30s) not realizing it is stateful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants