Skip to content

[WIP] Chunk distribution strategies #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Nov 23, 2020

Based on topic-available-chunks, see #802.
Based on #1043

Adds strategies for guiding applications to efficient parallel chunk loading patterns.
Idea: PR #802 lets reading applications inquire the chunks that are available for loading in the backend. This PR adds a function chunk_assignment::assignChunks, taking such a ChunkTable as well as a strategy as input and produces a further ChunkTable that may be used as a load pattern.

Implemented strategies:

  • Round Robin: Assign chunks to reading processes in turn.
  • by hostname: Assign chunks to processes on the same host. Parameterized by the interior distribution strategy.
  • by cuboid slice: Assign chunks according to a slicing into n_ranks hyperslabs of the data space.
  • by binpacking: Assign chunks in an approximatively balanced manner via an approximation algorithm for the NP complete binpacking problem.

Also, add an attribute to the global Series object (currently called rankMetaInfo, I'm sure we'll change this) that can be used to write string-formatted meta information for the single writing processes and assign meaning to them (e.g. hostnames). Used by the hostname-based strategy.

TODO:

  • Python bindings
  • Testing, mostly done manually so far
  • Cleanup
  • Documentation
  • This PR adds many MPI-based methods to the Series class. It would probably be good to first separate MPI and non-MPI headers, and then to add this functionality to the MPI headers.

@franzpoeschel franzpoeschel changed the title Chunk distribution strategies [WIP] Chunk distribution strategies Nov 23, 2020
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Nov 23, 2020

This pull request introduces 1 alert when merging 96841f9 into b2664b7 - view on LGTM.com

new alerts:

  • 1 for Catching by value

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 96841f9 to 6da1db3 Compare November 24, 2020 09:39
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Nov 24, 2020

This pull request introduces 1 alert when merging 6da1db3 into a6287fc - view on LGTM.com

new alerts:

  • 1 for Catching by value

@franzpoeschel franzpoeschel added api: new additions to the API discussion labels Jan 5, 2021
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 6da1db3 to 43b1e31 Compare February 23, 2021 09:59
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Feb 23, 2021

This pull request introduces 1 alert when merging 43b1e31 into f6b0054 - view on LGTM.com

new alerts:

  • 1 for Catching by value

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 43b1e31 to 5f2b347 Compare February 23, 2021 10:37
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Feb 23, 2021

This pull request introduces 1 alert when merging 5f2b347 into f6b0054 - view on LGTM.com

new alerts:

  • 1 for Catching by value

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 5f2b347 to bb7fa45 Compare March 8, 2021 15:45
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Mar 8, 2021

This pull request introduces 1 alert when merging bb7fa45 into 9b349eb - view on LGTM.com

new alerts:

  • 1 for Catching by value

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch 3 times, most recently from 8c7fc78 to 9fc6d6b Compare March 9, 2021 17:09
@franzpoeschel
Copy link
Contributor Author

@ax3l ping

@ax3l ax3l self-requested a review April 1, 2021 05:47
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 9fc6d6b to a1bb1e0 Compare April 26, 2021 08:36
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 26, 2021

This pull request introduces 1 alert when merging ebea6cf into 86a15ba - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch 2 times, most recently from 993cfb6 to 0fca8b7 Compare April 27, 2021 08:06
@franzpoeschel
Copy link
Contributor Author

I've added Python bindings now, including usage of one distribution strategy in openpmd-pipe as an integrated usage example.

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 0fca8b7 to ca6f0da Compare April 27, 2021 09:38
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from f9ba2a9 to 47653ac Compare June 18, 2021 09:23
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 47653ac to 89fea78 Compare August 9, 2021 13:55
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Aug 18, 2021

One issue of this approach is that the attribute rankMetaInfo is a vector of as many strings as there are parallel writers which I suspect is part of the reason why we saw scaling issues in some benchmarks.
Instead, a parallel dataset should preferrably be written.

EDIT: I've transformed this to a parallel dataset now. Needs some more checking for char portability. Also, we should probably buffer these values on reading, since the table can only be read in steps in which it was written. In short: Make this less like an attribute that can be set and inquired, and rather more automatic during Series construction.

@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch 3 times, most recently from 62758f5 to 86d5a8c Compare August 19, 2021 08:51
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 196d58e to e63c9e1 Compare March 26, 2024 11:06
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from e63c9e1 to 436360e Compare May 21, 2024 14:51
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 3c6550b to 2f9ba73 Compare June 7, 2024 12:09
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 2f9ba73 to 49b16f8 Compare June 26, 2024 11:52
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 49b16f8 to 5e17686 Compare July 16, 2024 09:36
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 5e17686 to 3745f58 Compare July 23, 2024 14:57
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 3745f58 to 3c93be4 Compare August 5, 2024 10:15
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch 2 times, most recently from 60b78ad to bd227b7 Compare August 16, 2024 10:58
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 936882c to e509ed6 Compare November 15, 2024 14:45
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch 2 times, most recently from ef9f968 to 5330e85 Compare December 17, 2024 11:09
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 5330e85 to 21edbbc Compare February 21, 2025 12:11
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 21edbbc to f355b8b Compare March 26, 2025 14:33
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch 2 times, most recently from f4ee1dc to bbbf60f Compare April 4, 2025 08:32
@franzpoeschel franzpoeschel force-pushed the topic-chunk-distribution branch from 18f0fe8 to a8df60d Compare April 22, 2025 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant