-
Notifications
You must be signed in to change notification settings - Fork 899
handle errors gracefuly to prevent SEGV #13194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oob_allgather_test() do not check isend() call success, leading to the possibility to use oob_req->reqs[] un-initialized upon error and thus to SEGV. Signed-off-by: Bruno Faccini <[email protected]>
3db0950
to
bc5e821
Compare
MCA_PML_BASE_SEND_STANDARD, comm, &oob_req->reqs[0])); | ||
MCA_PML_CALL(irecv(tmprecv, msglen, MPI_BYTE, recvfrom, | ||
if (OMPI_SUCCESS != rc) { | ||
return rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if conversion is safe here, return value of this function is ucc_status_t and might be not compatible with ompi error codes. If rc is positive number then it's actually not an error from ucc perspective since all errors in ucc are negative. Maybe just to make it safe the function returns UCC_ERR_NO_MESSAGE in case of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. The macro ends up calling the .pml_isend
field in the PML struct, which in the case of UCX points to mca_pml_ucx_isend, a function that returns OMPI errors.
Got the changes I asked but, but I'll still defer to those who are experts in this area of the code for providing a formal review
we will file a new PR for this change that adheres to our company policy |
we will file a new PR for this change that adheres to our company policy |
@@ -206,6 +212,8 @@ static ucc_status_t oob_allgather(void *sbuf, void *rbuf, size_t msglen, | |||
oob_req->msglen = msglen; | |||
oob_req->oob_coll_ctx = oob_coll_ctx; | |||
oob_req->iter = 0; | |||
oob_req->reqs[0] = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically I think this should be MPI_REQUEST_NULL.
oob_allgather_test() do not check isend() call
success, leading to the possibility to use
oob_req->reqs[] un-initialized upon error and
thus to SEGV.