-
Notifications
You must be signed in to change notification settings - Fork 0
Mpi local run #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: MPI_updates
Are you sure you want to change the base?
Mpi local run #30
Conversation
Let multiple CMSSW processes on the same or different machines coordinate event processing and transfer data products over MPI. The implementation is based on four CMSSW modules. Two are responsible for setting up the communication channels and coordinate the event processing: - the MPIController - the MPISource and two are responsible for the transfer of data products: - the MPISender - the MPIReceiver . The MPIController is an EDProducer running in a regular CMSSW process. After setting up the communication with an MPISource, it transmits to it all EDM run, lumi and event transitions, and instructs the MPISource to replicate them in the second process. The MPISource is a Source controlling the execution of a second CMSSW process. After setting up the communication with an MPIController, it listens for EDM run, lumi and event transitions, and replicates them in its process. Both MPIController and MPISource produce an MPIToken, a special data product that encapsulates the information about the MPI communication channel. The MPISender is an EDProducer that can read a collection of a predefined type from the Event, serialise it using its ROOT dictionary, and send it over the MPI communication channel. The MPIReceiver is an EDProducer that can receive a collection of a predefined type over the MPI communication channel, deserialise is using its ROOT dictionary, and put it in the Event. Both MPISender and MPIReceiver are templated on the type to be transmitted and de/serialised. Each MPISender and MPIReceiver is configured with an instance value that is used to match one MPISender in one process to one MPIReceiver in another process. Using different instance values allows the use of multiple MPISenders/MPIReceivers in a process. Both MPISender and MPIReceiver obtain the MPI communication channel reading an MPIToken from the event. They also produce a copy of the MPIToken, so other modules can consume it to declare a dependency on the previous modules. An automated test is available in the test/ directory.
Let MPISender and MPIReceiver consume, send/receive and produce collections of arbitrary types, as long as they have a ROOT dictionary and can be persisted. Note that any transient information is lost during the transfer, and needs to be recreated by the receiving side. The documentation and tests are updated accordingly. Warning: this approach is a work in progress! TODO: - improve framework integration - add checks between send/receive types
int numProducts; | ||
token.channel()->receiveProduct(instance_, numProducts); | ||
edm::LogVerbatim("MPIReceiver") << "Received number of products: " << numProducts; | ||
// int numProducts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these are commented out ?
I think "local run" may be misleading: both approaches (single Could you rename the option to "useMPINameServer" ? |
PR description:
PR validation:
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Before submitting your pull requests, make sure you followed this checklist: