Skip to content

Conversation

@ppebay
Copy link
Contributor

@ppebay ppebay commented Dec 30, 2024

Resolves #558

@ppebay ppebay self-assigned this Dec 30, 2024
@ppebay
Copy link
Contributor Author

ppebay commented Jan 13, 2025

A first review would be appreciated @lifflander to make sure we agree on the design.

I am confident that the computation of the node-level memory usage is correct:
Screenshot 2025-01-13 at 12 57 14

with indeed initially 4 blocks in the bottom row (node 0) and 1 block in top row (node 1), at 9B per block:

synthetic-dataset-blocks0

and in contrast, in rebalanced stage there are 6 blocks in bottom row (node 0) and 3 in top row (node 1) due to across-rank block replications:

synthetic-dataset-blocks4

@ppebay ppebay marked this pull request as ready for review January 14, 2025 22:14
@ppebay ppebay force-pushed the 558-add-node-level-memory-constraint-NEW branch from d024aa9 to 702bdd5 Compare January 15, 2025 12:14
@ppebay
Copy link
Contributor Author

ppebay commented Jan 16, 2025

@lifflander @nlslatt this is an interesting case (synthetic-blocks, load-only + memory constraint):

Configuration 1:

We begin by enforcing the memory constraint at the node-level:

  • 2 ranks per node;
  • memory constraint of 36.0 (i.e. maximum of 4 blocks per node);
    where we readily obtain an imbalance of 0.0, in the following steps:

synthetic-dataset-blocks0
synthetic-dataset-blocks1
synthetic-dataset-blocks2
synthetic-dataset-blocks3

One first noticeable finding is that the final state not only each node has a maximum of 4 blocks, but also each rank has at most 2 blocks (memory = 36.0/18.0):

Screenshot 2025-01-16 at 02 52 13

Therefore, it is in theory possible to achieve an equivalent per-rank memory constraint; however, the second important point is that this is achieved in the iterations above by traversing 2 intermediate ones with 3 blocks (i.e. memory=27.0) on some nodes.

Configuration 2:

We now try to enforce the equivalent memory constraint at the rank-level:

  • 1 rank per node (i.e. no ranks_per_node optional parameter in the configuration file);
  • memory constraint of 18.0 (i.e. maximum of 2 blocks per rank);
    in which case we now indeed cannot obtain an imbalance of 0.0, for instance in the following sequence (consistent with multiple other runs):

synthetic-dataset-blocks0
synthetic-dataset-blocks1

[... 6 identical iterations omitted ... ]

synthetic-dataset-blocks8

In other words, the iterative approach cannot attain the optimal configuration, and remains locked with a non-zero load imbalance:
Screenshot 2025-01-16 at 03 04 27

This is because, with this iterative algorithm, one has to traverse a weaker configuration, where the constraint is enforced at the node level but no longer at the equivalently-formulated rank level.

Conclusion:

The important overall finding is that even when it appears to be the case that equivalent node and rank memory constraints can be achieved with optimal load imbalance, allowing for the weaker (node-level) constraint to be achieved during the course of the iterations can be the only way to reach the final optimum. This could have consequences for actual, non-synthetic cases where the node-level constraint would provide more degrees of freedom for better performance.

@ppebay
Copy link
Contributor Author

ppebay commented Jan 17, 2025

@cwschilly could you please add the first configuration I was describing above to the integration tests? I think it's important to make sure we capture this corner case where relaxing the memory constraint to the node level allows convergence to 0 imbalance.

Copy link
Contributor

@cwschilly cwschilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lifflander lifflander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@ppebay ppebay merged commit 2050931 into develop Jan 30, 2025
11 checks passed
@ppebay ppebay deleted the 558-add-node-level-memory-constraint-NEW branch January 30, 2025 07:37
cwschilly added a commit that referenced this pull request May 13, 2025
… changes (#581)

* #558: base of reconstructed PR post conflict resolution and integration of changes

* #558: whitespace cleanup

* #558: fixed incomplete type

* #558: progress checkpoint where nodes and node level memory values are correct

* #558: whitespace cleanup

* #558: fix CI failures

* #558: add unit test for lbsNode

* #558: small syntax improvement

* #558: add space

* #558: parentheses needed in test when walrus operator is used

* #558: improved and enriched statistics printouts

* #558: added node getter and reporting node max memory usage properly too now

* #558: better initialization

* #558: additional and better testing

* #558: completed implementation fixing remaining shared blocks bug

* #558: removed now unused set_shared_blocks from tests

* #558: fix CI issues

* #558: WS cleanup

* #558: fix unit test failures

* #558: fix formatting and delete unnecessary block

* #558: use copy for rank objects instead of deepcopy; other small fixes

* #558: create new ranks set directly as member variable

* #558: use configuration 1 for load-only test case

---------

Co-authored-by: Caleb Schilly <[email protected]>
lifflander pushed a commit that referenced this pull request Jun 2, 2025
… changes (#581)

* #558: base of reconstructed PR post conflict resolution and integration of changes

* #558: whitespace cleanup

* #558: fixed incomplete type

* #558: progress checkpoint where nodes and node level memory values are correct

* #558: whitespace cleanup

* #558: fix CI failures

* #558: add unit test for lbsNode

* #558: small syntax improvement

* #558: add space

* #558: parentheses needed in test when walrus operator is used

* #558: improved and enriched statistics printouts

* #558: added node getter and reporting node max memory usage properly too now

* #558: better initialization

* #558: additional and better testing

* #558: completed implementation fixing remaining shared blocks bug

* #558: removed now unused set_shared_blocks from tests

* #558: fix CI issues

* #558: WS cleanup

* #558: fix unit test failures

* #558: fix formatting and delete unnecessary block

* #558: use copy for rank objects instead of deepcopy; other small fixes

* #558: create new ranks set directly as member variable

* #558: use configuration 1 for load-only test case

---------

Co-authored-by: Caleb Schilly <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add node-level memory constraint

4 participants