-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[WIP][tool] feature: scheduling analysis based on profiling data #4746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new tool for scheduling analysis based on profiling data, setting up a flexible framework with registries for different data parsers and visualizers. I've identified a few critical and high-severity issues that would prevent the tool from running correctly, including a bug in a class constructor and incorrect function implementations/registrations. Addressing these will make the tool functional as a baseline for further development.
| def __init__(): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def cluster_parser_mstx(input_path: str, output_path: str, config: DictConfig) -> ClusterProfilerData: | ||
| print("in mstx") | ||
| pass | ||
|
|
||
| @register_cluster_parser("nvtx") | ||
| def cluster_parser_nvtx(input_path: str, output_path: str, config: DictConfig) -> ClusterProfilerData: | ||
| print("in nvtx") | ||
| pass No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placeholder functions cluster_parser_mstx and cluster_parser_nvtx are type-hinted to return ClusterProfilerData, but they implicitly return None. This will lead to a runtime error in cluster_analysis.py when the None value is passed to the visualizer. These functions should return an instance of ClusterProfilerData.
| def cluster_parser_mstx(input_path: str, output_path: str, config: DictConfig) -> ClusterProfilerData: | |
| print("in mstx") | |
| pass | |
| @register_cluster_parser("nvtx") | |
| def cluster_parser_nvtx(input_path: str, output_path: str, config: DictConfig) -> ClusterProfilerData: | |
| print("in nvtx") | |
| pass | |
| def cluster_parser_mstx(input_path: str, output_path: str, config: DictConfig) -> ClusterProfilerData: | |
| print("in mstx") | |
| return ClusterProfilerData() | |
| @register_cluster_parser("nvtx") | |
| def cluster_parser_nvtx(input_path: str, output_path: str, config: DictConfig) -> ClusterProfilerData: | |
| print("in nvtx") | |
| return ClusterProfilerData() |
| ) | ||
| return CLUSTER_VISUALIZER_REGISTRY[fn_name] | ||
|
|
||
| @register_cluster_visualizer("mstx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster_visualizer_html function is incorrectly registered with the name "mstx". It should be registered as "html" to match its purpose and the default --vis-type argument in cluster_analysis.py. This mismatch will cause a ValueError at runtime.
| @register_cluster_visualizer("mstx") | |
| @register_cluster_visualizer("html") |
What does this PR do?
Sample

Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)