-
Notifications
You must be signed in to change notification settings - Fork 2
#558: Restructured PR to add node level memory constraint post recent changes #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
A first review would be appreciated @lifflander to make sure we agree on the design. I am confident that the computation of the node-level memory usage is correct: with indeed initially 4 blocks in the bottom row (node 0) and 1 block in top row (node 1), at 9B per block: and in contrast, in rebalanced stage there are 6 blocks in bottom row (node 0) and 3 in top row (node 1) due to across-rank block replications: |
d024aa9 to
702bdd5
Compare
|
@lifflander @nlslatt this is an interesting case ( Configuration 1:We begin by enforcing the memory constraint at the node-level:
One first noticeable finding is that the final state not only each node has a maximum of 4 blocks, but also each rank has at most 2 blocks (memory = 36.0/18.0):
Therefore, it is in theory possible to achieve an equivalent per-rank memory constraint; however, the second important point is that this is achieved in the iterations above by traversing 2 intermediate ones with 3 blocks (i.e. memory=27.0) on some nodes. Configuration 2:We now try to enforce the equivalent memory constraint at the rank-level:
[... 6 identical iterations omitted ... ] In other words, the iterative approach cannot attain the optimal configuration, and remains locked with a non-zero load imbalance: This is because, with this iterative algorithm, one has to traverse a weaker configuration, where the constraint is enforced at the node level but no longer at the equivalently-formulated rank level. Conclusion:The important overall finding is that even when it appears to be the case that equivalent node and rank memory constraints can be achieved with optimal load imbalance, allowing for the weaker (node-level) constraint to be achieved during the course of the iterations can be the only way to reach the final optimum. This could have consequences for actual, non-synthetic cases where the node-level constraint would provide more degrees of freedom for better performance. |
|
@cwschilly could you please add the first configuration I was describing above to the integration tests? I think it's important to make sure we capture this corner case where relaxing the memory constraint to the node level allows convergence to 0 imbalance. |
cwschilly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lifflander
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
… changes (#581) * #558: base of reconstructed PR post conflict resolution and integration of changes * #558: whitespace cleanup * #558: fixed incomplete type * #558: progress checkpoint where nodes and node level memory values are correct * #558: whitespace cleanup * #558: fix CI failures * #558: add unit test for lbsNode * #558: small syntax improvement * #558: add space * #558: parentheses needed in test when walrus operator is used * #558: improved and enriched statistics printouts * #558: added node getter and reporting node max memory usage properly too now * #558: better initialization * #558: additional and better testing * #558: completed implementation fixing remaining shared blocks bug * #558: removed now unused set_shared_blocks from tests * #558: fix CI issues * #558: WS cleanup * #558: fix unit test failures * #558: fix formatting and delete unnecessary block * #558: use copy for rank objects instead of deepcopy; other small fixes * #558: create new ranks set directly as member variable * #558: use configuration 1 for load-only test case --------- Co-authored-by: Caleb Schilly <[email protected]>
… changes (#581) * #558: base of reconstructed PR post conflict resolution and integration of changes * #558: whitespace cleanup * #558: fixed incomplete type * #558: progress checkpoint where nodes and node level memory values are correct * #558: whitespace cleanup * #558: fix CI failures * #558: add unit test for lbsNode * #558: small syntax improvement * #558: add space * #558: parentheses needed in test when walrus operator is used * #558: improved and enriched statistics printouts * #558: added node getter and reporting node max memory usage properly too now * #558: better initialization * #558: additional and better testing * #558: completed implementation fixing remaining shared blocks bug * #558: removed now unused set_shared_blocks from tests * #558: fix CI issues * #558: WS cleanup * #558: fix unit test failures * #558: fix formatting and delete unnecessary block * #558: use copy for rank objects instead of deepcopy; other small fixes * #558: create new ranks set directly as member variable * #558: use configuration 1 for load-only test case --------- Co-authored-by: Caleb Schilly <[email protected]>












Resolves #558