Skip to content

topics_for_forum_wg_block_03_18_24

Howard Pritchard edited this page Mar 21, 2024 · 8 revisions

Topics for Sessions WG time block at MPI Forum 3/18/24

Open Issues for MPI 4.2

To be read later this week:

  • groups: relax restriction on pset name provenance (second vote and no-no vote)

Sessions and Reinit

We left off at whether introduction of a "global state" ability to specify a MPI reinit error handler

  • most recent slide deck is here.

Sessions and ULFM

None fault aware MPI Sessions example

// general high-level optimistic application
void main() {
    
     MPI_Session session;
     MPI_Session_Init(MPI_INFO_NULL, MPI_ERRORS_ARE_FATAL, &session);
     MPI_Group group;
     MPI_Group_from_session_pset(session, "mpi://world", &group);
     MPI_Comm comm;
     MPI_Comm_create_from_group(group, &comm);
    
     ret = do_stuff_with_comm(comm);
    
     if (MPI_SUCCESS == ret) {
           MPI_Comm_disconnect(&comm);
           MPI_Session_Finalize(&session);
           break;
          
     } else {
           panic();
          
     }
}

ULFM style fault tolerant aware MPI Sessions example:

// general high-level pragmatic application
void main() {
    
     // additional code
     while (1) {
    
     MPI_Session session;
     MPI_Session_Init(MPI_INFO_NULL, MPI_ERRORS_RETURN, &session);
     MPI_Group world, failed, group;
     MPI_Group_from_session_pset(session, "mpi://world", &world);
    
     // additional code
     MPI_Session_get_proc_failed(session, &failed); // new API, seems easy to do
     MPI_Group_difference(world, failed, &group);
     MPI_Group_free(&world);
     MPI_Group_free(&failed);
 
     MPI_Comm comm;
     MPI_Comm_create_from_group(group, &comm); // <-- the detail-devils live here
     MPI_Group_free(&group);
    
     ret = do_stuff_with_comm(comm);
    
     MPI_Comm_disconnect(&comm);
     MPI_Session_Finalize(&session);
    
     if (MPI_SUCCESS == ret) {
           break; // all done!
          
     } else if (MPI_ERR_PROC_FAILED == ret) {
           continue; // no more panic
          
     }
    
     // additional code
     } // end while
}
 

There has been discussion (emails) about challenges in implementing a process fail-stop fault aware MPI_Comm_create_from_group.

Dynamic Process Sets (maybe)

Notes

Some random notes Howard made while trying to drive. Some are captured in the pptx below. Note the meeting was not recorded.

There was some discussion about how how/if to support nesting of error handlers if app is using a MPI_ERRORS_INIT as the global error handler. Dan suggested use of MPI_Comm_run_errhandler as one approach.

Joseph brought up how the example codes that check for MPI_ERR_REINIT but in a possibly multi-threaded region.
Need for app specific thread synchronization was discussed.

There was also discussion of invocation of MPI_Test_failure implicitly sets global MPI_ERRORS_REINIT. Or maybe make this done by resurrecting MPI_Reinit but for a new purpose?

link to a few slides from WG discussion

Material from recent MPI forum WG discussions

Ancient material

https://github.com/mpiwg-sessions/sessions-issues/wiki/2016-12-12-webex/2016-12-12-webex.pptx https://github.com/mpiwg-sessions/sessions-issues/wiki/SessionsV2-ideas.pptx

Clone this wiki locally