Commit 56658c8
authored
feat(prover): Autoscaler detect which pod out of resources and reschedule (#4008)
## What ❔
Autoscaler detect which pod out of resources and reschedule.
For one run mark a pool as full (`max_pool_size` = number of Running +
Pending pods) if there is out of resources pod in it.
Add `priority` config override.
Remove unneeded need_to_move_duration config parameter.
<!-- What are the changes this PR brings about? -->
<!-- Example: This PR adds a PR template to the repo. -->
<!-- (For bigger PRs adding more context is appreciated) -->
## Why ❔
Before all pending pod could be pushed to different cluster if even one
of them is out of resources.
Add possibility to quickly switch to backup GPU if the main one is out.
<!-- Why are these changes done? What goal do they contribute to? What
are the principles behind them? -->
<!-- The `Why` has to be clear to non-Matter Labs entities running their
own ZK Chain -->
<!-- Example: PR templates ensure PR reviewers, observers, and future
iterators are in context about the evolution of repos. -->
## Is this a breaking change?
- [ ] Yes
- [x] No
## Operational changes
<!-- Any config changes? Any new flags? Any changes to any scripts? -->
<!-- Please add anything that non-Matter Labs entities running their own
ZK Chain may need to know -->
## Checklist
<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->
- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [x] Tests for the changes have been added / updated.
- [x] Documentation comments have been added / updated.
- [x] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.
ref ZKD-26821 parent 91772a4 commit 56658c8
File tree
6 files changed
+383
-66
lines changed- prover/crates/bin/prover_autoscaler
- src
- global
- k8s
6 files changed
+383
-66
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
| 173 | + | |
| 174 | + | |
173 | 175 | | |
174 | 176 | | |
175 | 177 | | |
| |||
212 | 214 | | |
213 | 215 | | |
214 | 216 | | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
215 | 222 | | |
216 | 223 | | |
217 | 224 | | |
218 | 225 | | |
219 | 226 | | |
220 | 227 | | |
221 | 228 | | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
222 | 232 | | |
223 | 233 | | |
224 | 234 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | 82 | | |
89 | 83 | | |
90 | 84 | | |
| |||
136 | 130 | | |
137 | 131 | | |
138 | 132 | | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
139 | 140 | | |
140 | 141 | | |
141 | 142 | | |
| |||
161 | 162 | | |
162 | 163 | | |
163 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
164 | 170 | | |
165 | 171 | | |
166 | 172 | | |
| |||
204 | 210 | | |
205 | 211 | | |
206 | 212 | | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | 213 | | |
213 | 214 | | |
214 | 215 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | 54 | | |
58 | 55 | | |
59 | 56 | | |
| |||
69 | 66 | | |
70 | 67 | | |
71 | 68 | | |
| 69 | + | |
72 | 70 | | |
73 | 71 | | |
74 | 72 | | |
| |||
80 | 78 | | |
81 | 79 | | |
82 | 80 | | |
| 81 | + | |
83 | 82 | | |
84 | 83 | | |
85 | 84 | | |
| |||
0 commit comments