-
Notifications
You must be signed in to change notification settings - Fork 308
Workaround for a potential bug in the driver related to TMA descriptor #6985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # if _CCCL_CTK_BELOW(13, 2) | ||
| if (::cuda::__driver::__version_below(13, 2)) | ||
| { | ||
| const auto __tensorMapPtr = reinterpret_cast<::cuda::std::uint64_t*>(static_cast<void*>(&__tensorMap)); | ||
| __tensorMapPtr[1] = ~(::cuda::std::uint64_t{1} << 21); | ||
| } | ||
| # endif // _CCCL_CTK_BELOW(13, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # if _CCCL_CTK_BELOW(13, 2) | |
| if (::cuda::__driver::__version_below(13, 2)) | |
| { | |
| const auto __tensorMapPtr = reinterpret_cast<::cuda::std::uint64_t*>(static_cast<void*>(&__tensorMap)); | |
| __tensorMapPtr[1] = ~(::cuda::std::uint64_t{1} << 21); | |
| } | |
| # endif // _CCCL_CTK_BELOW(13, 2) | |
| if (::cuda::__driver::__version_below(13, 2)) | |
| { | |
| const auto __tensorMapPtr = reinterpret_cast<::cuda::std::uint64_t*>(static_cast<void*>(&__tensorMap)); | |
| __tensorMapPtr[1] = ~(::cuda::std::uint64_t{1} << 21); | |
| } |
We should just check the version at runtime. Can you also add a comment what this is and the nvbug number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, we can skip for newer CTK. Why making a driver call when it is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a comment what this is and the nvbug number?
sorry, the PR was not finalized yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am with @fbusato here, we should avoid driver calls if we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but we can prevent the case when the user compiles with CUDA 13.2, but then runs the program with driver version 13.1. We do this already in some other APIs. We cache the driver version, so there should be almost no overhead at all.
I think preventing a bug has greater value than 1 version comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double-checked. The driver version is encoded, so there is practically zero overhead when calling this function. I'm fine with this modification.
Anyway, I don't think it is possible to compile and run a code compiled with CUDA 13.2 with lower driver version. However, considering our support for non-conventional platforms, edge cases could still be possible.
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 3h 43m: Pass: 100%/91 | Total: 1d 09h | Max: 3h 08m | Hits: 92%/214122See results here. |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin branch/3.0.x
git worktree add -d .worktree/backport-6985-to-branch/3.0.x origin/branch/3.0.x
cd .worktree/backport-6985-to-branch/3.0.x
git switch --create backport-6985-to-branch/3.0.x
git cherry-pick -x d91fc2068118924215a8906c369a2bbd6cb1790c |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin branch/3.1.x
git worktree add -d .worktree/backport-6985-to-branch/3.1.x origin/branch/3.1.x
cd .worktree/backport-6985-to-branch/3.1.x
git switch --create backport-6985-to-branch/3.1.x
git cherry-pick -x d91fc2068118924215a8906c369a2bbd6cb1790c |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin branch/3.2.x
git worktree add -d .worktree/backport-6985-to-branch/3.2.x origin/branch/3.2.x
cd .worktree/backport-6985-to-branch/3.2.x
git switch --create backport-6985-to-branch/3.2.x
git cherry-pick -x d91fc2068118924215a8906c369a2bbd6cb1790c |
Description
bug/5736804 workaround