-
Notifications
You must be signed in to change notification settings - Fork 0
#5: LB: add work formula and task-cluster summary info #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
558ee7d to
48ce3bb
Compare
3b22d83 to
adb691c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds work formula computation and task-cluster summary infrastructure to the load balancer, enabling more sophisticated load balancing decisions that account for computation, communication, and memory costs.
Key Changes
- Introduces a configurable work model with coefficients for compute load, inter-node communication, intra-node communication, and shared-memory communication
- Adds cluster summarization functionality that aggregates task and communication information at the cluster level
- Implements memory usage tracking and validation with breakdown by footprint, working memory, and serialized memory
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/vt-lb/algo/temperedlb/work_model.h |
Defines WorkModel, WorkBreakdown, MemoryBreakdown structures and WorkModelCalculator for computing work from task/communication data |
src/vt-lb/algo/temperedlb/work_model.cc |
Implements work calculation logic including incremental updates and memory constraint checking |
src/vt-lb/algo/temperedlb/configuration.h |
Extracts Configuration struct from temperedlb.h with memory info accessors |
src/vt-lb/algo/temperedlb/cluster_summarizer.h |
Defines TaskClusterSummaryInfo and ClusterSummarizer for aggregating cluster-level statistics |
src/vt-lb/algo/temperedlb/cluster_summarizer.cc |
Implements cluster summarization including load, communication, and memory metrics |
src/vt-lb/algo/temperedlb/temperedlb.h |
Refactors TemperedLB to use new work model, adds trial-based execution and statistics computation |
src/vt-lb/algo/temperedlb/visualize.h |
Extends visualization to track and display intra-cluster communication bytes |
src/vt-lb/algo/temperedlb/clustering.h |
Adds utility function to verify all tasks are clustered and wraps code in namespace |
src/vt-lb/model/PhaseData.h |
Adds rank-level memory tracking fields (footprint bytes and max memory available) |
src/vt-lb/model/Task.h |
Reformats indentation for consistency |
src/vt-lb/algo/baselb/baselb.h |
Adds save/restore phase data methods for trial-based execution |
src/vt-lb/comm/vt/proxy_wrapper.impl.h |
Comments out debug printf statements |
src/vt-lb/algo/temperedlb/symmetrize_comm.h |
Adds blank line after header comment block |
examples/test_example.cc |
Removes polling loop from example code |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
74b73da to
fe84f36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes #5