-
Notifications
You must be signed in to change notification settings - Fork 936
part-persist: implement message aggregation #13039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
part-persist: implement message aggregation #13039
Conversation
224e839 to
84882d1
Compare
|
@mdosanjh Any chance we could get an initial review from you on this PR? |
|
I’ll look into it today.
…On Tue, Jan 28, 2025 at 09:15 Tommy Janjusic ***@***.***> wrote:
@mdosanjh <https://github.com/mdosanjh> Any chance we could get an
initial review from you on this PR?
—
Reply to this email directly, view it on GitHub
<#13039 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4RGBD4MSPZA3QM3Y4HJZ32M6UJNAVCNFSM6AAAAABVHNG5I6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJZGQ2TIMZRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
i think someone needs to look at the mpi4py failure. looks specific to changes in this PR. |
|
Just an assumption from my side having a look at the error message without diving into the mpi4py code: It could be related to the extension of fields in the Request handle, which is mapped to a python object in mpi4py. |
|
I doubt that. Probably failing only with mpi4py because it has the most extensive test suite in our CI infrastructure. This pr should not be merged till mpi4py passes. |
|
If its any help here's where the segfault is occurring: at line 533. The problem is req->flags is NULL. |
|
Thank you very much, I will look into it. That error never occurred with my tests |
83a4131 to
55007af
Compare
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
This aggregation scheme is intended to allow OpenMPI to transfer larger messages if the user-reported partitions are too small or too many. This is achieved by using an internal partitioning where each internal (transfer) partition corresponds to one or multiple user-reported partitions. The implementation provides an interface for insertion of user partitions, that optionally outputs a transfer partition that is ready. This is achieved by associating each transfer partition with an atomic counter, tracking the number of corresponding pready-calls. As soon as a counter reaches the number of corresponding user-partitions, the transfer partition is returned in the respective insertion call. This implementation is thread-safe. Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
Signed-off-by: Axel Schneewind <[email protected]>
55007af to
da974db
Compare
|
Hello! The Git Commit Checker CI bot found a few problems with this PR: da974db: use MCA_BASE_COMPONENT_INIT
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Signed-off-by: Axel Schneewind <[email protected]>
da974db to
5816374
Compare
|
Hello! The Git Commit Checker CI bot found a few problems with this PR: 7e95464: add aggregation scheme header to local_sources
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Signed-off-by: Axel Schneewind <[email protected]>
7e95464 to
67aecd6
Compare
Created a component for partitioned communication that supports message aggregation, based on the existing part-persist component. The goal is to prevent drops in effective bandwidth when using too fine-grained partitionings.
This component allows enforcing hard limits on size and count of transferred partitions, regardless of the partitioning required by the application. These limits can be specified using mca-parameters (min_message_size, max_message_count). Their default values might require revision.
If a user-provided partitioning violates the constraints, a more coarse-grained partitioning is selected, where multiple user-partitions are mapped to an internal (transfer) partition. Transfer of an internal partition is started as soon as Pready has been called on all corresponding user-partitions. Each transfer partition is associated with an atomic counter, tracking the number of corresponding user-partitions that have been marked ready.
This implementation is a result of "Benchmarking the State of MPI Partitioned Communication in Open MPI" presented at EuroMPI 2024 (https://events.vsc.ac.at/event/123/page/341-program).