Skip to content

Conversation

@zatkins-dev
Copy link
Collaborator

@zatkins-dev zatkins-dev commented Jan 24, 2025

Prevents double allocations for CeedVector when using HIP vector with unified addressing and XNACK.

Also, updates more of the HIP vector operations to use hipBLAS functions rather than custom kernels.

const CeedScalar *d_x, *d_u;
CeedScalar *d_v;
CeedBasis_Hip *data;
Ceed_Hip *hip_data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another stray

}

CeedCallBackend(CeedBasisGetCeed(basis, &ceed));
CeedCallBackend(CeedGetData(ceed, &hip_data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

CeedVector_Hip *impl;

CeedCallBackend(CeedVectorGetData(vec, &impl));
CeedCallHip(CeedVectorReturnCeed(vec), hipDeviceSynchronize());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ratel seems to work fine without this line, and is faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CeedVectorSyncArray mean that one could immediately start an MPI_Send? If the host doesn't know that the previous kernel (writing to the array) has completed, then it would be racy to call MPI_Send. (Might be rare to trip, but we don't want that kind of bug.)

If our sends are using a kernel for packing (on the same stream), then the host doesn't need to know when the earlier stuff completes, but we still need to sync after the packing kernel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a fair point, I think that we need to be a bit more careful and only sync when the host needs the data. Otherwise this acts as a hard sync with the GPU, which seems to have performance impacts.

@jrwrigh
Copy link
Collaborator

jrwrigh commented Feb 26, 2025

FYI, generally prefer rebase to merge for dev branches. It doesn't matter for squash-merges (the commit history gets nuked anyways), but for normal merges it helps the git history be more regular.

@zatkins-dev
Copy link
Collaborator Author

FYI, generally prefer rebase to merge for dev branches. It doesn't matter for squash-merges (the commit history gets nuked anyways), but for normal merges it helps the git history be more regular.

Yeah generally I agree - I need to strip down this branch and rebuild it probably, it's currently a mess due to changes at the AMD workshop.

@jeremylt
Copy link
Member

I think we want to merge this before the review so we can have libCEED 0.13 and Ratel 0.4 with this? Is the question of the sync call the big blocker right now?

@jeremylt
Copy link
Member

jeremylt commented Mar 4, 2025

Restriction offset arrays may also want this

@zatkins-dev
Copy link
Collaborator Author

I think we want to merge this before the review so we can have libCEED 0.13 and Ratel 0.4 with this? Is the question of the sync call the big blocker right now?

I think so? To be honest, I've been more focused on MPM work in Ratel and haven't had much time to work on this.

If you have more time and desire to extract the changes into a clean branch, I'd be happy to review. Otherwise, I probably won't get to it until late this week at the earliest, more likely next week.

@jeremylt
Copy link
Member

jeremylt commented Mar 4, 2025

No rush - it was just a thought that crossed my mind as we look forward to future activities for libCEED

@jeremylt
Copy link
Member

I'd like to get this in for the release. If you're still tight on time, I can tidy up the branch. The discussion above about syncing seems to be the real sticking point though.

@jeremylt jeremylt added this to the v0.13 milestone Mar 19, 2025
@zatkins-dev
Copy link
Collaborator Author

I'd like to get this in for the release. If you're still tight on time, I can tidy up the branch. The discussion above about syncing seems to be the real sticking point though.

if you have time, that would be great. Ultimately, I think we need to sync when a sync is requested for correctness.

@jeremylt
Copy link
Member

See #1788

@jeremylt jeremylt mentioned this pull request Mar 20, 2025
@jeremylt
Copy link
Member

Transferred to #1788

@jeremylt jeremylt closed this Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants