Expand distributed indexing, match numpy indexing scheme#938
Open
ClaudiaComito wants to merge 283 commits into
Open
Expand distributed indexing, match numpy indexing scheme#938ClaudiaComito wants to merge 283 commits into
ClaudiaComito wants to merge 283 commits into
Conversation
4 tasks
This was referenced Aug 30, 2022
for more information, see https://pre-commit.ci
7 tasks
|
|
||
| # 1D boolean mask resolution | ||
| first = key[0] if isinstance(key, tuple) and len(key) >= 1 else key | ||
| if isinstance(first, (DNDarray, torch.Tensor, np.ndarray)) and arr.ndim >= 1: |
Collaborator
There was a problem hiding this comment.
I think it would be nice to cast numpy arrays and torch tensors to DNDarray in the beginning of this function. Then we always know we have a DNDarray and don't have to worry about stuff like numel or size.
I think it would be nice if we do:
- Early out for some special things that we need to be fast
- Cast array keys to DNDarray such that we have a key that is a tuple of ellipses, slices, integers, or DNDarrays
- Any further processing of keys
What do you think, @ClaudiaComito? Would that make sense?
for more information, see https://pre-commit.ci
Co-authored-by: Thomas Saupe <39156931+brownbaerchen@users.noreply.github.com>
for more information, see https://pre-commit.ci
* First small cleanup * Another small simplification
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces a significant overhaul of distributed indexing within
dndarray.py, specifically targeting the__getitem__and__setitem__methods. The primary objective is to achieve full NumPy indexing compliance in a distributed environment while minimizing MPI overhead and memory footprint.The logic has been refactored to identify zero-communication paths ("early out"), and route heavy unordered advanced indexing through optimized communication.
The following table shows the distribution semantics of DNDarray indexing operations.
UPDATED 26.5.2026
array[key]array[key]splitaxis and balanced status directly from the distributed key.array[key]Yes for slices/masks. Unordered local advanced indices are automatically distributed across the split axis under the hood.
array[key]distr_maskfast-path or triggers__getitem_unorderedfor cross-node MPI collective fetching.array[key] = valarray[key] = valarray[key] = valvalue's split axis doesn't match the target's split axis, aRuntimeErroris raised. If they do match,valueis dynamically load-balanced (redistribute_) to match the target's chunk sizes before assignment.array[key] = valarray[key] = valarray[key] = valAlltoallvshuffle to assign elements to their global unordered indices.Routing logic
UPDATED 26.5.2026
graph TD Start((Receive Key)) --> CheckScalar{Is key a pure scalar<br/>and not boolean?} CheckScalar -- Yes --> EvalRoot{Compute root} EvalRoot --> OpScalar[op_type = 'scalar'] CheckScalar -- No --> CheckFastPath{Matches distr_mask<br/>fast path?} CheckFastPath -- Yes & not tuple --> OpDistrMask1[op_type = 'distr_mask'] CheckFastPath -- No / Tuple --> Normalize[Normalize keys, extract bounds,<br/>check dimensionality & broadcast] Normalize --> FinalRouting{Evaluate Key State} FinalRouting -->|root is not None| OpScalar2[op_type = 'scalar'] FinalRouting -->|split_key_is_ordered == 0| OpDist[op_type = 'distributed'<br/>Unordered MPI Communication] FinalRouting -->|split_key_is_ordered == -1| OpDesc[op_type = 'descending_slice'] FinalRouting -->|key_is_mask_like == True| MaskTypeCheck{distr_mask_fast_path?} MaskTypeCheck -- Yes --> OpDistrMask2[op_type = 'distr_mask'] MaskTypeCheck -- No --> OpLocalMask[op_type = 'local_mask'] FinalRouting -->|Default / Ordered| OpAdv[op_type = 'advanced'<br/>Local Fast Path] %% Map to actual handlers subgraph Handlers [Target Routing Methods] OpScalar & OpScalar2 --> H_Scalar[__getitem_scalar<br/>__setitem_scalar] OpDist --> H_Dist[__getitem_advanced_distributed<br/>__setitem_advanced_distributed] OpDesc --> H_Desc[__getitem_descending_slice_distributed<br/>__setitem_descending_slice_distributed] OpDistrMask1 & OpDistrMask2 --> H_DistMask[__getitem_mask<br/>__setitem_mask] OpLocalMask --> H_LocalMask[__getitem_advanced_local<br/>__setitem_advanced_local] OpAdv --> H_Adv[__getitem_advanced_local<br/>__setitem_advanced_local] end %% Styling classDef target fill:#d4edda,stroke:#28a745,stroke-width:2px; class H_Scalar,H_Dist,H_Desc,H_DistMask,H_LocalMask,H_Adv target;Main changes
To Be Continued...
Memory footprint
Scaling behaviour
Issue/s resolved: #703 #914 #918 #1012 #1019 #2135 #1816 #824
Changes proposed:
Type of change
Memory requirements
Performance
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
yes / no
skip ci