-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[perf] fix: modify the NPU profiler default configuration #4475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] fix: modify the NPU profiler default configuration #4475
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to reduce the data volume from the NPU profiler by changing the default profiling level from level1 to level0. The changes are consistently applied across various configuration files, examples, and the default configuration object. Additionally, the profiler configuration is updated to exclude communication data and switch to a database export format, which should further improve performance. I have one critical suggestion to improve the robustness of a runtime dependency check.
248e80b to
a79a55e
Compare
8204110 to
dafa529
Compare
9cfefff to
6a659fe
Compare
6996701 to
9b101b8
Compare
Co-authored-by: Shangwei-Li <[email protected]>
1. Check torch_npu version instead of sig.parameters for better readability and troubleshooting 2. Delete aic_metrics since it's not necessary for level0 3. Recommend 'module' instead of 'stack'
9b101b8 to
c7b898a
Compare
What does this PR do?
Profiling in reinforcement learning generates a large volume of data, which impairs its ease of use. Based on optimization experience @mengchengTang , the default recommended parameters have been modified.
refer to https://www.hiascend.com/document/detail/zh/Pytorch/720/apiref/torchnpuCustomsapi/context/torch_npu-profiler-_ExperimentalConfig.md for detailed interface specifications.
The test results of tests/special_npu/run_qwen2_5_05b_grpo.sh are as follows:
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)