-
Notifications
You must be signed in to change notification settings - Fork 55
More generic check for CUDA-aware MPI #1793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More generic check for CUDA-aware MPI #1793
Conversation
…; warning is issued if PyTorch supports GPUs but no cuda-aware MPI is found.
Thank you for the PR! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1793 +/- ##
==========================================
+ Coverage 92.06% 92.07% +0.01%
==========================================
Files 85 86 +1
Lines 13111 13140 +29
==========================================
+ Hits 12070 12099 +29
Misses 1041 1041
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned this can be merged, thanks for looking into this!
Thank you for the PR! |
Thank you for the PR! |
Benchmarks results - Sponsored by perun
Grafana Dashboard |
idea:
|
Some more things:
|
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
…nd ROCm versioning
@JuanPedroGHM @ClaudiaComito I have added a module |
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
Thank you for the PR! |
Examples of possible global variables, and where to use them
Things we should do with those variables
|
@JuanPedroGHM @ClaudiaComito idea: could we merge this PR as it is (as quick bugfix) and open an new issue for extensive refactoring as suggested by @JuanPedroGHM ? |
Thank you for the PR! |
Thank you for the PR! |
changes have been done in the meantime
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable
git worktree add -d .worktree/backport-1793-to-stable origin/stable
cd .worktree/backport-1793-to-stable
git switch --create backport-1793-to-stable
git cherry-pick -x 8a3ae51adcb4a997346e8f379e48a9a72b9f9517 c96a4e427c124d94df5bb6d1bd64908b997cc4f4 a423f485678ede58f194b1d3f1692c143342487e 70766976c20b7720c090b06f28b178556e3283f6 e44c43b2b51773c9781d37ee27496da03f33ce72 |
* cuda-awareness of openmpi is now checked in a try-except-construction; warning is issued if PyTorch supports GPUs but no cuda-aware MPI is found. (cherry picked from commit 8a3ae51) * Update communication.py (cherry picked from commit c96a4e4) * added module _config in core which is intended to handle MPI, CUDA, and ROCm versioning (cherry picked from commit a423f48) * added variable GPU_AWARE_MPI (cherry picked from commit 7076697) * added MPICH (cherry picked from commit e44c43b) --------- Co-authored-by: Hoppe <[email protected]> Co-authored-by: Fabian Hoppe <[email protected]>
fixes #1787
Due Diligence
[ ] benchmarks: created for new functionality[ ] benchmarks: performance improved or maintainedDoes this change modify the behaviour of other functions? If so, which?
no