Commit 00cc551
committed
feat(ci): add metrics-driven cluster autoscaling validation with Karpenter + KWOK
Add cluster autoscaling validation to both H100 GPU workflows (inference
and training). The test validates the full metrics-driven autoscaling chain:
DCGM metrics → Prometheus → prometheus-adapter (external metric)
→ HPA scales Deployment → pending pods → Karpenter → KWOK nodes
New files:
- kwok/scripts/install-karpenter-kwok.sh: builds Karpenter KWOK
provider via ko and deploys with Helm into kind clusters
- kwok/scripts/validate-cluster-autoscaling.sh: reusable E2E script
that verifies external metrics, HPA scaling, node provisioning,
pod scheduling, and scale-down consolidation
- kwok/manifests/karpenter/: NodePool, KWOKNodeClass, HPA test
workload, and GPU instance type definitions
Changed files:
- recipes/components/prometheus-adapter/values.yaml: add workload-
attributed custom metrics, external metrics rules for cluster-wide
GPU metrics (power_usage, memory_used, utilization) with
namespaced: false, and 30s metrics relist interval
- .github/workflows/gpu-h100-{inference,training}-test.yaml: add
cluster autoscaling step and trigger paths for karpenter manifests
- .settings.yaml: add karpenter v1.8.0 to testing_tools1 parent 225d551 commit 00cc551
File tree
9 files changed
+914
-20
lines changed- .github/workflows
- kwok
- manifests/karpenter
- scripts
- recipes/components/prometheus-adapter
9 files changed
+914
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
| |||
169 | 172 | | |
170 | 173 | | |
171 | 174 | | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
172 | 200 | | |
173 | 201 | | |
174 | 202 | | |
| |||
228 | 256 | | |
229 | 257 | | |
230 | 258 | | |
| 259 | + | |
231 | 260 | | |
232 | 261 | | |
| 262 | + | |
233 | 263 | | |
234 | 264 | | |
235 | 265 | | |
| |||
294 | 324 | | |
295 | 325 | | |
296 | 326 | | |
297 | | - | |
298 | | - | |
299 | | - | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
300 | 332 | | |
301 | 333 | | |
| 334 | + | |
302 | 335 | | |
303 | 336 | | |
304 | 337 | | |
| |||
310 | 343 | | |
311 | 344 | | |
312 | 345 | | |
313 | | - | |
314 | | - | |
315 | | - | |
| 346 | + | |
| 347 | + | |
316 | 348 | | |
317 | | - | |
| 349 | + | |
| 350 | + | |
318 | 351 | | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
330 | 367 | | |
331 | 368 | | |
332 | 369 | | |
| |||
336 | 373 | | |
337 | 374 | | |
338 | 375 | | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
339 | 382 | | |
340 | 383 | | |
341 | 384 | | |
| 385 | + | |
342 | 386 | | |
343 | 387 | | |
344 | 388 | | |
| |||
363 | 407 | | |
364 | 408 | | |
365 | 409 | | |
| 410 | + | |
366 | 411 | | |
367 | 412 | | |
368 | 413 | | |
| |||
474 | 519 | | |
475 | 520 | | |
476 | 521 | | |
477 | | - | |
478 | | - | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
479 | 526 | | |
480 | 527 | | |
481 | 528 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
35 | 39 | | |
36 | 40 | | |
37 | 41 | | |
| |||
172 | 176 | | |
173 | 177 | | |
174 | 178 | | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
175 | 185 | | |
176 | 186 | | |
177 | 187 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
0 commit comments