-
Notifications
You must be signed in to change notification settings - Fork 62
Reduce memory overhead for HIP backends on MI300A GPUs #1734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| const CeedScalar *d_x, *d_u; | ||
| CeedScalar *d_v; | ||
| CeedBasis_Hip *data; | ||
| Ceed_Hip *hip_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another stray
| } | ||
|
|
||
| CeedCallBackend(CeedBasisGetCeed(basis, &ceed)); | ||
| CeedCallBackend(CeedGetData(ceed, &hip_data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
9dd589f to
35926f8
Compare
35926f8 to
5ad657d
Compare
| CeedVector_Hip *impl; | ||
|
|
||
| CeedCallBackend(CeedVectorGetData(vec, &impl)); | ||
| CeedCallHip(CeedVectorReturnCeed(vec), hipDeviceSynchronize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ratel seems to work fine without this line, and is faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CeedVectorSyncArray mean that one could immediately start an MPI_Send? If the host doesn't know that the previous kernel (writing to the array) has completed, then it would be racy to call MPI_Send. (Might be rare to trip, but we don't want that kind of bug.)
If our sends are using a kernel for packing (on the same stream), then the host doesn't need to know when the earlier stuff completes, but we still need to sync after the packing kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a fair point, I think that we need to be a bit more careful and only sync when the host needs the data. Otherwise this acts as a hard sync with the GPU, which seems to have performance impacts.
|
FYI, generally prefer rebase to merge for dev branches. It doesn't matter for squash-merges (the commit history gets nuked anyways), but for normal merges it helps the git history be more regular. |
Yeah generally I agree - I need to strip down this branch and rebuild it probably, it's currently a mess due to changes at the AMD workshop. |
|
I think we want to merge this before the review so we can have libCEED 0.13 and Ratel 0.4 with this? Is the question of the sync call the big blocker right now? |
|
Restriction offset arrays may also want this |
I think so? To be honest, I've been more focused on MPM work in Ratel and haven't had much time to work on this. If you have more time and desire to extract the changes into a clean branch, I'd be happy to review. Otherwise, I probably won't get to it until late this week at the earliest, more likely next week. |
|
No rush - it was just a thought that crossed my mind as we look forward to future activities for libCEED |
|
I'd like to get this in for the release. If you're still tight on time, I can tidy up the branch. The discussion above about syncing seems to be the real sticking point though. |
if you have time, that would be great. Ultimately, I think we need to sync when a sync is requested for correctness. |
|
See #1788 |
|
Transferred to #1788 |
Prevents double allocations for
CeedVectorwhen using HIP vector with unified addressing and XNACK.Also, updates more of the HIP vector operations to use
hipBLASfunctions rather than custom kernels.