Commit 1a98c29
Add comprehensive metrics instrumentation for scheduler and executor (#10)
* Add shuffle read metrics extraction and QueryStageExecutor::plan() method
- Add public getter methods to PartitionStats (num_rows, num_batches, num_bytes)
- Extend QueryStageExecutor trait with plan() method to access underlying ExecutionPlan
- Add extract_shuffle_read_metrics() to walk plan tree and sum ShuffleReaderExec partition stats
- Record shuffle read metrics (bytes, rows, duration) after successful task execution in executor
* Add shuffle locality metrics to ExecutorMetricsCollector, SchedulerMetricsCollector, and ShuffleReaderExec
- Add record_shuffle_read_local/remote methods to ExecutorMetricsCollector trait
- Add record_task_shuffle_affinity_hit/miss methods to SchedulerMetricsCollector trait
- Add ShuffleReadMetricsCallback trait in ballista-core for tracking local vs remote reads
- Instrument shuffle_reader.rs to call metrics callback during partition fetches
- Add SessionConfigExt methods to pass metrics callback via session config
* Add metrics collector to SchedulerState and instrument executor and planning metrics
- Add metrics_collector field to SchedulerState struct
- Instrument record_planning_duration in submit_job
- Instrument record_executor_registered/deregistered and set_active_executor_count
- Update all SchedulerState constructors and call sites
* Add stage and task lifecycle metrics instrumentation to update_task_status flow
* Add shuffle affinity metrics to scheduler task binding
* Add actual task scheduling latency tracking
- Add schedulable_time_millis field to TaskDescription to track when a task became schedulable (when its stage transitioned to running state)
- Update all TaskDescription creation sites to pass RunningStage.stage_running_time
- Calculate actual scheduling latency in record_task_scheduled calls by computing the difference between current time and schedulable_time_millis
- This enables accurate scheduler_task_scheduling_latency_ms metrics instead of the previous placeholder value of 0
* fix lint1 parent 9fcff62 commit 1a98c29
33 files changed
Lines changed: 1729 additions & 238 deletions
File tree
- ballista
- core/src
- execution_plans
- remote_catalog
- serde/scheduler
- executor
- src
- metrics
- scheduler/src
- cluster
- metrics
- scheduler_server
- state
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
51 | | - | |
52 | 50 | | |
53 | 51 | | |
54 | 52 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | | - | |
31 | | - | |
32 | 31 | | |
33 | 32 | | |
34 | 33 | | |
| |||
248 | 247 | | |
249 | 248 | | |
250 | 249 | | |
251 | | - | |
252 | | - | |
253 | 250 | | |
254 | 251 | | |
255 | 252 | | |
| |||
258 | 255 | | |
259 | 256 | | |
260 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | 267 | | |
271 | 268 | | |
272 | | - | |
273 | 269 | | |
274 | 270 | | |
275 | 271 | | |
276 | 272 | | |
277 | | - | |
| 273 | + | |
278 | 274 | | |
279 | 275 | | |
280 | 276 | | |
| |||
306 | 302 | | |
307 | 303 | | |
308 | 304 | | |
| 305 | + | |
309 | 306 | | |
310 | 307 | | |
311 | 308 | | |
312 | 309 | | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | 310 | | |
317 | 311 | | |
318 | | - | |
319 | 312 | | |
320 | 313 | | |
321 | 314 | | |
322 | 315 | | |
323 | | - | |
| 316 | + | |
324 | 317 | | |
325 | 318 | | |
326 | 319 | | |
| |||
450 | 443 | | |
451 | 444 | | |
452 | 445 | | |
| 446 | + | |
453 | 447 | | |
454 | 448 | | |
455 | 449 | | |
456 | 450 | | |
457 | 451 | | |
458 | 452 | | |
| 453 | + | |
459 | 454 | | |
460 | 455 | | |
461 | 456 | | |
| |||
474 | 469 | | |
475 | 470 | | |
476 | 471 | | |
| 472 | + | |
477 | 473 | | |
| 474 | + | |
| 475 | + | |
478 | 476 | | |
479 | 477 | | |
480 | 478 | | |
481 | 479 | | |
482 | 480 | | |
483 | 481 | | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
484 | 495 | | |
485 | 496 | | |
486 | 497 | | |
| |||
492 | 503 | | |
493 | 504 | | |
494 | 505 | | |
495 | | - | |
| 506 | + | |
| 507 | + | |
496 | 508 | | |
497 | 509 | | |
498 | 510 | | |
| |||
502 | 514 | | |
503 | 515 | | |
504 | 516 | | |
505 | | - | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
506 | 534 | | |
0 commit comments