-
Notifications
You must be signed in to change notification settings - Fork 299
Allow parallel iteration over data containers #5218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ad921de to
1c2649d
Compare
|
about type-checking errors: #5195 |
bea773f to
6c307f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds parallel iteration capabilities to data containers in yt, allowing users to iterate over data chunks in parallel with optional reduction operations. The implementation introduces a new piter method on data containers that supports various reduction strategies (concatenation, sum, min, max) either on all processors or just the root processor.
- Added a
pitermethod toYTSelectionContainerfor parallel iteration over data chunks - Enhanced the
parallel_objectsfunction with reduction operations and improved type annotations - Implemented new reduction methods (
reduce,all_concat,concat) in the communication system
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| yt/utilities/parallel_tools/parallel_analysis_interface.py | Enhanced parallel processing infrastructure with reduction operations and modernized ResultsStorage class |
| yt/data_objects/selection_objects/data_selection_objects.py | Added piter method to data containers for parallel iteration with comprehensive documentation |
| - concat: the storage object will contain a flattened list of | ||
| each results. | ||
| - cat_on_root: same as concat, but only the root processor will |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation mentions 'concat' as a reduction option, but the actual parameter value is 'cat'. This inconsistency could confuse users.
| - concat: the storage object will contain a flattened list of | |
| each results. | |
| - cat_on_root: same as concat, but only the root processor will | |
| - cat: the storage object will contain a flattened list of | |
| each result. | |
| - cat_on_root: same as cat, but only the root processor will |
| case "cat_on_root": | ||
| new_storage = my_communicator.concat(to_share, root=0) | ||
| case "sum" | "max" | "min": | ||
| new_storage = my_communicator.reduce(to_share, op=reduction, root=0) |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduce method is called with root=0, but when the operation is not distributed, the result should be available on all processors. This inconsistency with the documented behavior could cause issues.
| new_storage = my_communicator.reduce(to_share, op=reduction, root=0) | |
| if getattr(my_communicator, "distributed", False): | |
| new_storage = my_communicator.reduce(to_share, op=reduction, root=0) | |
| else: | |
| new_storage = my_communicator.reduce(to_share, op=reduction) |
| - concat: the storage object will contain a flattened list of | ||
| each results. | ||
| - cat_on_root: same as concat, but only the root processor will |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same documentation inconsistency as in the other file - 'concat' is documented but 'cat' is the actual parameter value.
| - concat: the storage object will contain a flattened list of | |
| each results. | |
| - cat_on_root: same as concat, but only the root processor will | |
| - cat: the storage object will contain a flattened list of | |
| each results. | |
| - cat_on_root: same as cat, but only the root processor will |
brittonsmith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great. Very concise, happy to see this merged. We can use this to refactor the loop over all halos in the HaloCatalog.
e4447c3 to
53c52f2
Compare
|
I need to fix the typing issues (which I'm not convinced are real) and write the doc. Please hold before merging :) |
|
Yes, mypy infers reachability in a slightly broken way that may give off false negatives, maybe this lint should actually be turned off entirely. |
53c52f2 to
d1b60b3
Compare
|
In the meantime, I've added |
matthewturk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
|
Great addition to the codebase! Will this apply to all data objects, not just |
|
I realised I never replied to your message @chummels. I haven't tried with particle data, but there's no reason for it not to work if it supports chunking. It also works with any selection, not only |
|
aaaaand I just found a bug - let's not merge this just yet. The issue is with RAMSES reader, where we use progressive indexing (a spatial region isn't fully indexed until it's actually read). The problem is then that different MPI tasks end up reading different regions, and so which chunks may overlap a given region depends on what has been read previously. If you use this PR's parallel reader one after the other, you may end up in a situation where, e.g. MPI task 1 thinks it needs to read CPUs #1, #2, and #3, but MPI task 2 already read #3 and indexed it, so it know it won't overlap with the region, leaving only #1 and #2 to be read. Since the MPI tasks aren't communicating this information with each other, this will lead to undefined behavior. |
For RAMSES datasets, each MPI rank may have already indexed fully part of the domains, leading to different chunks across MPI ranks. This means that we can do an MPI reduction over which domains should be read to get the most specific subset of domains, but also to ensure that all ranks have the same list of domains when chunking.
101c22e to
8779485
Compare
Thanks to @AnatoleStorck for the fix!
947320e to
39777f6
Compare
|
I fixed the bug thanks to @AnatoleStorck (see 39777f6)! The test failure is unrelated, however. |
PR Summary
This allows to iterate in parallel over data containers.
The syntax is
This makes #4730 obsolete.
PR Checklist