-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile - Support cuda12.8 for Blackwell arch #682
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified this PR locally.
d963ad8
to
6a51193
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #682 +/- ##
=======================================
Coverage 85.67% 85.67%
=======================================
Files 99 99
Lines 7211 7211
=======================================
Hits 6178 6178
Misses 1033 1033
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
$(eval ARCHS := 75 80 86 89 90 100) | ||
if [ -d msccl ]; then rm -rf msccl; fi; \ | ||
git clone --single-branch --branch main https://github.com/Azure/msccl.git \ | ||
&& git -C msccl checkout 87048bd && git -C msccl submodule update --recursive --init | ||
else ifeq ($(shell echo $(CUDA_VER)">=11.8" | bc -l), 1) | ||
$(eval ARCHS := 70 75 80 86 89 90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may only need 80, 90, and 100 archs for build, didn't test on the rest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building for the ARCH's for which cuda version supports. Will be helpful if someone wants to use this container on a 75 arch.
On the side: condensing the list will definitely help reduce the overall build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I mean we didn't test msccl on V100 (75 arch), A10 (86 arch) etc. so it may not work
Description
Updated docker for 12.8
Use cutlass latest relase 3.8 with ARCH 100(blackwell) support
add latest nccl-test release with ARCH 100(blackwell)
Updated msccl to support build for sm_100
No breaking changes, so backward compatible tested with 12.4(for now)