UCT/CUDA: Skip CUDA cleanup for destroyed contexts#11437
Merged
Conversation
Applications can reset the CUDA primary context before MPI_Finalize. Avoid calling cuStreamDestroy or cuEventDestroy on resources from a context that is no longer valid, since CUDA may warn or crash in that path. Signed-off-by: Roie Danino <rdanino@nvidia.com>
iyastreb
reviewed
May 8, 2026
Signed-off-by: Roie Danino <rdanino@nvidia.com>
Signed-off-by: Roie Danino <rdanino@nvidia.com>
Signed-off-by: Roie Danino <rdanino@nvidia.com>
iyastreb
reviewed
May 12, 2026
…ther than complex hook installations affecting the entire gtest binary Signed-off-by: Roie Danino <rdanino@nvidia.com>
Contributor
Author
Signed-off-by: Roie Danino <rdanino@nvidia.com>
Contributor
Author
gleon99
approved these changes
May 26, 2026
Contributor
|
@roiedanino need to |
rakhmets
reviewed
May 27, 2026
| return status; | ||
| } | ||
|
|
||
| if (primary_ctx != ctx_rsc->ctx) { |
Contributor
There was a problem hiding this comment.
It's safer to use context ID to check that contexts are the same: cuCtxGetId. The method was added in CUDA 12 to solve the problem when newly created context could have the same pointer address as the previously deleted one.
Comment on lines
+64
to
+67
| /* Retained CUDA primary context, if @ctx is a primary context */ | ||
| CUcontext primary_ctx; | ||
| /* CUDA device of @primary_ctx */ | ||
| CUdevice cuda_device; |
Contributor
There was a problem hiding this comment.
Maybe use cuda_device field as a flag that ctx field represents a primary device context.
| event_mp); | ||
| int pushed; | ||
|
|
||
| if (uct_cuda_base_ctx_rsc_push(ctx_rsc, &pushed) != UCS_OK) { |
Contributor
There was a problem hiding this comment.
Do we really need to push the context here? Can we just check that retained primary context has the same ID as one that was stored during uct_cuda_ctx_rsc_t creation?
gleon99
pushed a commit
that referenced
this pull request
May 31, 2026
* UCT/CUDA/BASE: CUDA ctx - Fixing #11437 CR comments Signed-off-by: Roie Danino <rdanino@nvidia.com> * UCT/CUDA/BASE: Avoid pushing context during cleanup Signed-off-by: Roie Danino <rdanino@nvidia.com> --------- Signed-off-by: Roie Danino <rdanino@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Avoid calling cuStreamDestroy or cuEventDestroy on resources from a context that is no longer valid
Why?
CUDA may warn or crash in that path.