Skip to content

comm/cid: fix for edge case with intercomm creation #12465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

hppritcha
Copy link
Member

A barrier was removed from the cid allocation algorithm for intercommunicators that leads to random hangs when doing MPI_Intercomm_create or moral equivalent.

See issue #12367

the cid allocation algorithm and comm from cid lookup was modified to use a pointer to ompi_mpi_comm_null as the placeholder in the list of pointers to ompi commucators.

A convenience debug function was added to comm_cid.c to dump out this table of communicators.

@hppritcha hppritcha requested a review from wenduwan April 11, 2024 23:25
@wenduwan wenduwan added the mpi4py-all Run the optional mpi4py CI tests label Apr 11, 2024
@wenduwan wenduwan force-pushed the fix_for_intercomm_cid_alloc branch from 9399be3 to c31f714 Compare April 11, 2024 23:28
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ompi_mpi_comm_null as a sentinel should work better than using the parent communicator. If you remove the ompi_comm_dump_communicators function this should be ready to go.

@@ -1589,3 +1589,16 @@ static int ompi_comm_ft_allreduce_intra_pmix_nb(int *inbuf, int *outbuf, int cou

#endif /* OPAL_ENABLE_FT_MPI */

void ompi_comm_dump_communicators(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have specialized mechanisms to dump information, especially communicators. This function does not follow any of OMPI coding standards, such as not using printf and friends but relying on OPAL_OUTPUT_VERBOSE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np i'll remove from the pr

@bosilca
Copy link
Member

bosilca commented Apr 11, 2024

Conceptually this is not correct because ompi_mpi_comm_null can legitimately be added to the ompi_mpi_communicators list. In the current code it will work because the only time we extract it from the list we use opal_pointer_array_get_item instead of ompi_comm_lookup.

good point. i'll see if there's a not-to-painful way to address this.

@wenduwan wenduwan removed the mpi4py-all Run the optional mpi4py CI tests label Apr 11, 2024
@hppritcha
Copy link
Member Author

Conceptually this is not correct because ompi_mpi_comm_null can legitimately be added to the ompi_mpi_communicators list. In the current code it will work because the only time we extract it from the list we use opal_pointer_array_get_item instead of ompi_comm_lookup.

good point. i'll see if there's a not-to-painful way to address this.

okay so if &ompi_mpi_comm_null is added to the list - which we actually do by default to support sessions - how is this relevant to the logic in the ob1 pml? The whole issue revolves around ob1 pml getting a comm back from doing an lookup on the incoming hdr's ctx and then thinking - oh I have a valid comm to use! we don't want that.

@bosilca
Copy link
Member

bosilca commented Apr 12, 2024

I understand the problem, we need a sentinel to put in the ompi_mpi_communicators. What I'm arguing here is that ompi_mpi_comm_null is a valid communicator and already belongs in that list, but the proposed code cannot identify such cases because it willingly replace a return of ompi_mpi_comm_null by NULL, even in the valid case.

We could solve this by defining OMPI_COMM_SENTINEL to 0x00001, and using this instead of ompi_mpi_comm_null. This makes the code much clear and also allows the ompi_comm_lookup function to correctly return ompi_mpi_comm_null.

@hppritcha
Copy link
Member Author

I see. I think the sentinel idea should work. I'll update the PR with that approach

@hppritcha hppritcha force-pushed the fix_for_intercomm_cid_alloc branch 2 times, most recently from 0725601 to 85993fe Compare April 12, 2024 17:38
@bosilca
Copy link
Member

bosilca commented Apr 13, 2024

I was going over this this morning and I realized this code had and has a big flaw. The original code would have break the the resilience support because we go over the complete list of communicators to identify those that contains the failed processes to update their status and set in motion the PROC_FAILED and the automatic REVOKE. That code is not set to encounter the same communicator twice, which means that any errors that would have triggered while a communicator creation was underway would have lead to badness.

The current code has now a different flaw. We use the ompi_mpi_communicators list via two accessors: opal_pointer_array_get_item and ompi_comm_lookup. The PR protects the ompi_comm_lookup path by checking for the special value, but there is no protection for the 6 uses of opal_pointer_array_get_item and opal_pointer_array_test_and_set_item.

@hppritcha
Copy link
Member Author

I will look into a fix for this next week.

@hppritcha hppritcha force-pushed the fix_for_intercomm_cid_alloc branch from 85993fe to 7dcdbd2 Compare April 15, 2024 21:47
@hppritcha
Copy link
Member Author

@wenduwan @bosilca some updates.

@wenduwan
Copy link
Contributor

Run AWS CI again.

@@ -140,6 +140,12 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_communicator_t);
#define OMPI_COMM_BLOCK_WORLD 16
#define OMPI_COMM_BLOCK_OTHERS 8

/**
* Placeholder to use in array of ompi communicators during CID allocation
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit extra line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if ( NULL != comm ) {
/* Communicator has not been freed before finalize */
OBJ_RELEASE(comm);
comm=(ompi_communicator_t *)opal_pointer_array_get_item(&ompi_mpi_communicators, i);
comm= ompi_comm_lookup(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit space around =

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

A barrier was removed from the cid allocation algorithm
for intercommunicators that leads to random hangs when
doing MPI_Intercomm_create or moral equivalent.

See issue open-mpi#12367

the cid allocation algorithm and comm from cid lookup
was modified to use OMPI_COMM_SENTINEL as the placeholder
in the list of pointers to ompi commucators.

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha force-pushed the fix_for_intercomm_cid_alloc branch from 7dcdbd2 to 15a3d26 Compare April 16, 2024 15:31
@hppritcha hppritcha merged commit ae7fe9a into open-mpi:main Apr 16, 2024
14 checks passed
hppritcha added a commit to hppritcha/ompi that referenced this pull request Apr 17, 2024
PR open-mpi#12465 gave Intel OneAPI a headache and threw an error trying
to compile the code.  This patch fixes that.

Signed-off-by: Howard Pritchard <[email protected]>
@dalcinl
Copy link
Contributor

dalcinl commented Apr 18, 2024

I believe that the fixes from here solved some of the issues related to dynamic process management, in particular accept/connect.

hppritcha added a commit to hppritcha/ompi that referenced this pull request Apr 24, 2024
PR open-mpi#12465 gave Intel OneAPI a headache and threw an error trying
to compile the code.  This patch fixes that.

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit 574778d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants