Skip to content

handle errors gracefuly to prevent SEGV #13194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bfaccini
Copy link
Contributor

oob_allgather_test() do not check isend() call
success, leading to the possibility to use
oob_req->reqs[] un-initialized upon error and
thus to SEGV.

oob_allgather_test() do not check isend() call
success, leading to the possibility to use
oob_req->reqs[] un-initialized upon error and
thus to SEGV.

Signed-off-by: Bruno Faccini <[email protected]>
@bfaccini bfaccini force-pushed the bfaccini/issue_13191 branch from 3db0950 to bc5e821 Compare April 15, 2025 05:58
MCA_PML_BASE_SEND_STANDARD, comm, &oob_req->reqs[0]));
MCA_PML_CALL(irecv(tmprecv, msglen, MPI_BYTE, recvfrom,
if (OMPI_SUCCESS != rc) {
return rc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if conversion is safe here, return value of this function is ucc_status_t and might be not compatible with ompi error codes. If rc is positive number then it's actually not an error from ucc perspective since all errors in ucc are negative. Maybe just to make it safe the function returns UCC_ERR_NO_MESSAGE in case of error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. The macro ends up calling the .pml_isend field in the PML struct, which in the case of UCX points to mca_pml_ucx_isend, a function that returns OMPI errors.

@bfaccini bfaccini requested a review from jsquyres April 17, 2025 08:20
@jsquyres jsquyres dismissed their stale review April 17, 2025 11:23

Got the changes I asked but, but I'll still defer to those who are experts in this area of the code for providing a formal review

@bfaccini
Copy link
Contributor Author

we will file a new PR for this change that adheres to our company policy

@bfaccini bfaccini closed this Apr 17, 2025
@bfaccini
Copy link
Contributor Author

we will file a new PR for this change that adheres to our company policy

@bfaccini bfaccini deleted the bfaccini/issue_13191 branch April 17, 2025 12:03
@@ -206,6 +212,8 @@ static ucc_status_t oob_allgather(void *sbuf, void *rbuf, size_t msglen,
oob_req->msglen = msglen;
oob_req->oob_coll_ctx = oob_coll_ctx;
oob_req->iter = 0;
oob_req->reqs[0] = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I think this should be MPI_REQUEST_NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants