-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
Currently, MPI supports a rich set of topology functions working on virtual topologies. This means that at the exception of the « reorder » parameter which is a hint to proceed to an embedding in the hardware topology through rank reordering, this topology is solely linked to application characteristics.
Proposal
In this issue, we propose to mitigate the virtual topology aspect introducing topologies which are linked to the hardware on COMM_WORLD and on topological communicators generated by SPLIT_TYPE. These would be exceptions and up to the implementation (integrator) MPI, in other words totally optional.
These topologies should be connected graphs (no separated components) and shall provide a symmetrical adjacency matrix to allow the use of neighborhood collectives.
Consequences to Users
Users may access a description of the machine topology in COMM_WORLD, describing how processes are connected. The nature of this graph and of the retained topology is up to the machine integrator. On some systems featuring torus networks and allocation constraints, it could be a Cartesian topology on some others a weighted distributed graph describing process layout.
When using MPI_SPLIT_TYPE users may be able to query which process is closer to another at the given level of splitting (currently only TYPE_SHARED) and possibly in the future new keys (including hierarchical ones).
The user would be provided with the ability to call neighborhood collective on hardware topologies, exchanging messages alongside memory and network topology. This could benefit problems with low requirements in terms of virtual topology (for example agent-based models or space-filling models) by providing them with a « map » of the machine directly queried from MPI.
Consequences to Runtimes
This addition should be optional and if not present codes relying on it will have MPI_UNDEFINED as the topology on the communicator of interest, still leaving them the possibility to define manually a communicator with an attached topology to pursue their execution.
This proposal supposes no new functions in the standard and from an implementation point of view is solely linked to the ability to register a representative hardware topology in COMM_WORLD and respective SPLIT_TYPE primitives using scalable representations (DIST_GRAPH and CART).
Remaining Problems / Future Controversy
Despite the ability to query rank coordinates is already in production codes on some systems with networks fitting the MPI_CART representation it is still to be proven that a graph representation of the system would be of interest.
Adding a « hardware » topology with symmetrical adjacency matrix on some communicators enables the use of neighborhood collectives. We have to be careful of their meaning and characteristics which are bound to be highly dependent from the exposed topology.