Commit dbd8a9c
[GCM] Add schedule_exit and bf_exit sdiag counters to slurm monitor (#142)
* [GCM] Add schedule_exit and bf_exit sdiag counters to slurm monitor
Summary:
Extend the slurm monitor sdiag telemetry with the schedule_exit and
bf_exit sub-section counters surfaced by sdiag --json. These counters
expose why the main scheduler and the backfill scheduler stopped each
cycle (e.g., end of job queue, max time, max job start, max RPC count),
which gives much better visibility into scheduler tuning and saturation
than the existing aggregate cycle stats alone.
Test Plan:
Updated and ran the existing sdiag JSON parsing tests against the
checked-in sample-sdiag-output.json, including the missing-fields case.
Ran 'pytest gcm/tests/test_slurm.py gcm/tests/test_slurm_rest_client.py'
- all 25 tests passed.
* [GCM] Add schedule_exit and bf_exit sdiag counters to slurm monitor
Summary:
Extend the slurm monitor sdiag telemetry with the schedule_exit and
bf_exit sub-section counters surfaced by sdiag --json. These counters
expose why the main scheduler and the backfill scheduler stopped each
cycle (e.g., end of job queue, max time, max job start, max RPC count),
which gives much better visibility into scheduler tuning and saturation
than the existing aggregate cycle stats alone.
Test Plan:
Updated and ran the existing sdiag JSON parsing tests against the
checked-in sample-sdiag-output.json, including the missing-fields case.
Ran 'pytest gcm/tests/test_slurm.py gcm/tests/test_slurm_rest_client.py'
- all 25 tests passed.
---------
Co-authored-by: yongl user <yongl@yongl-login-0.yongl-login.tenant-slurm.svc.cluster.local>1 parent d40284e commit dbd8a9c
4 files changed
Lines changed: 70 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
| 202 | + | |
| 203 | + | |
202 | 204 | | |
203 | 205 | | |
204 | 206 | | |
| |||
224 | 226 | | |
225 | 227 | | |
226 | 228 | | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
227 | 243 | | |
228 | 244 | | |
229 | 245 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
| 117 | + | |
116 | 118 | | |
117 | 119 | | |
118 | 120 | | |
| |||
137 | 139 | | |
138 | 140 | | |
139 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
140 | 154 | | |
141 | 155 | | |
142 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
513 | 513 | | |
514 | 514 | | |
515 | 515 | | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
516 | 528 | | |
517 | 529 | | |
518 | 530 | | |
| |||
572 | 584 | | |
573 | 585 | | |
574 | 586 | | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
575 | 599 | | |
576 | 600 | | |
577 | 601 | | |
| |||
0 commit comments