-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Adds ROCmProfilerService and fixes NVProfilerService. #49580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Splits the NVProfilerService into a generic framework activity annotation base and an NVTX specific customization. A ROCm customization is added. Another one for VTune should be easily derivable. This commit also adds thread safety wit spinlocks to avoid thread scheduling effects. Some ranges experience double start of double end. This is now indicated with a mark, instead of failing on an assertion.
|
cms-bot internal usage |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49580/47110 ERROR: Build errors found during clang-tidy run. |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49580/47111 ERROR: Build errors found during clang-tidy run. |
|
This PR is dependent on cms-sw/cmsdist#10238 |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49580/47203 |
|
Pull request #49580 was updated. |
|
Milestone for this pull request has been moved to CMSSW_16_1_X. Please open a backport if it should also go in to CMSSW_16_0_X. |
PR description:
Splits the NVProfilerService into a generic framework activity annotation base and an NVTX specific customization. A ROCm customization is added. Another one for VTune should be easily derivable.
This commit also adds thread safety wit spinlocks to avoid thread scheduling effects. Some ranges experience double start of double end. This is now indicated with a mark, instead of failing on an assertion.
Support added for ES modules execution, path ranges, source and event cleanup (which is are the most important contributions to the inter event gap).
This could still be polished a lot, by adding configurable content or message, coloring by event/EDM stream, etc...
The support for EDM events is also not complete and could still be expanded.
PR validation:
The two services have been tested on NVIDIA and AMD GPUs and showed expected results:
Here with the PyTorch test:

This was also tested with HLT configurations.