Expand distributed indexing, match numpy indexing scheme by ClaudiaComito · Pull Request #938 · helmholtz-analytics/heat

ClaudiaComito · 2022-03-24T05:23:18Z

Description

This pull request introduces a significant overhaul of distributed indexing within dndarray.py, specifically targeting the __getitem__ and __setitem__ methods. The primary objective is to achieve full NumPy indexing compliance in a distributed environment while minimizing MPI overhead and memory footprint.

The logic has been refactored to identify zero-communication paths ("early out"), and route heavy unordered advanced indexing through optimized communication.

The following table shows the distribution semantics of DNDarray indexing operations.

UPDATED 26.5.2026

Array is distributed	Operation	Key is distributed	Value is distributed	Result is distributed	Notes
No	`array[key]`	No	--	No	Standard local indexing.
No	`array[key]`	Yes	--	Yes	The resulting array inherits the `split` axis and balanced status directly from the distributed key.
Yes	`array[key]`	No	--	Yes / No	No if the key is a pure scalar along the split axis (the split dimension is lost and the result is broadcasted). Yes for slices/masks. Unordered local advanced indices are automatically distributed across the split axis under the hood.
Yes	`array[key]`	Yes	--	Yes	Split axis is retained or shifted. Evaluated as a `distr_mask` fast-path or triggers `__getitem_unordered` for cross-node MPI collective fetching.
No	`array[key] = val`	No	No	No (In-place)	Standard local assignment.
Yes	`array[key] = val`	No	No	Yes (In-place)	The local value is automatically converted into a distributed array and broadcasted to align with the array's distribution constraints.
Yes	`array[key] = val`	No	Yes	Yes (In-place)	Split axis match required: If the `value`'s split axis doesn't match the target's split axis, a `RuntimeError` is raised. If they do match, `value` is dynamically load-balanced (`redistribute_`) to match the target's chunk sizes before assignment.
Yes	`array[key] = val`	Yes	No, scalar	Yes (In-place)	A pure scalar value is correctly assigned to all masked/indexed elements across all MPI ranks natively.
Yes	`array[key] = val`	Yes	No, array	ERROR	Exception raised. You cannot assign a local/non-distributed array using a distributed index.
Yes	`array[key] = val`	Yes	Yes	Yes (In-place)	Communication-heavy: The value's split axis is dynamically redistributed to match the key's distribution layout, followed by an `Alltoallv` shuffle to assign elements to their global unordered indices.

Routing logic

UPDATED 26.5.2026

graph TD
    Start((Receive Key)) --> CheckScalar{Is key a pure scalar<br/>and not boolean?}
    
    CheckScalar -- Yes --> EvalRoot{Compute root}
    EvalRoot --> OpScalar[op_type = 'scalar']
    
    CheckScalar -- No --> CheckFastPath{Matches distr_mask<br/>fast path?}
    
    CheckFastPath -- Yes & not tuple --> OpDistrMask1[op_type = 'distr_mask']
    
    CheckFastPath -- No / Tuple --> Normalize[Normalize keys, extract bounds,<br/>check dimensionality & broadcast]
    
    Normalize --> FinalRouting{Evaluate Key State}
    
    FinalRouting -->|root is not None| OpScalar2[op_type = 'scalar']
    FinalRouting -->|split_key_is_ordered == 0| OpDist[op_type = 'distributed'<br/>Unordered MPI Communication]
    FinalRouting -->|split_key_is_ordered == -1| OpDesc[op_type = 'descending_slice']
    
    FinalRouting -->|key_is_mask_like == True| MaskTypeCheck{distr_mask_fast_path?}
    MaskTypeCheck -- Yes --> OpDistrMask2[op_type = 'distr_mask']
    MaskTypeCheck -- No --> OpLocalMask[op_type = 'local_mask']
    
    FinalRouting -->|Default / Ordered| OpAdv[op_type = 'advanced'<br/>Local Fast Path]

    %% Map to actual handlers
    subgraph Handlers [Target Routing Methods]
        OpScalar & OpScalar2 --> H_Scalar[__getitem_scalar<br/>__setitem_scalar]
        OpDist --> H_Dist[__getitem_advanced_distributed<br/>__setitem_advanced_distributed]
        OpDesc --> H_Desc[__getitem_descending_slice_distributed<br/>__setitem_descending_slice_distributed]
        OpDistrMask1 & OpDistrMask2 --> H_DistMask[__getitem_mask<br/>__setitem_mask]
        OpLocalMask --> H_LocalMask[__getitem_advanced_local<br/>__setitem_advanced_local]
        OpAdv --> H_Adv[__getitem_advanced_local<br/>__setitem_advanced_local]
    end
    
    %% Styling
    classDef target fill:#d4edda,stroke:#28a745,stroke-width:2px;
    class H_Scalar,H_Dist,H_Desc,H_DistMask,H_LocalMask,H_Adv target;

Main changes

abstracts key parsing and alignment into a centralized private method that handles dimension expansion, shape broadcasting, and classifies the state of the indexing operation to determine network routing.
enforces standard last-assignment-wins semantics for advanced indexing duplicates on cuda tensors by generating linear indices and mapping local occurrence priorities (thanks @Hakdag97 ).
intercepts multidimensional and single-dimensional boolean masks early in the pipeline, converting them to explicit integer configurations locally to prevent unnecessary cross-rank broadcasting.
maps and isolates zero-communication assignments during slice operations, executing completely local pytorch tensor modifications when the requested indices and data already reside on the active rank.
structures unordered read requests by compiling global communication matrices, enabling the dispatch of non-blocking Isend and Recv calls strictly between nodes that own the requested indices and those requesting them.
forces distribution alignment during set operations if the right-hand side assignment value is also distributed, utilizing an Alltoallv operation to shuffle payload data and target indices concurrently.
introduces a value broadcasting helper function to natively squeeze or expand the dimensions of scalar or tensor payloads to match the specific dimensional footprint of the target slice before assignment occurs.

To Be Continued...

Memory footprint

Scaling behaviour

Issue/s resolved: #703 #914 #918 #1012 #1019 #2135 #1816 #824

Changes proposed:

feature extension in __process_key, getitem, and setitem methods
edge case handling
extensive comparison to numpy API in unittests

Type of change

Memory requirements

Performance

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

yes / no

skip ci

for more information, see https://pre-commit.ci

brownbaerchen · 2026-06-03T14:36:18Z

+
+        # 1D boolean mask resolution
+        first = key[0] if isinstance(key, tuple) and len(key) >= 1 else key
+        if isinstance(first, (DNDarray, torch.Tensor, np.ndarray)) and arr.ndim >= 1:


I think it would be nice to cast numpy arrays and torch tensors to DNDarray in the beginning of this function. Then we always know we have a DNDarray and don't have to worry about stuff like numel or size.

I think it would be nice if we do:

Early out for some special things that we need to be fast

Cast array keys to DNDarray such that we have a key that is a tuple of ellipses, slices, integers, or DNDarrays

Any further processing of keys

What do you think, @ClaudiaComito? Would that make sense?

for more information, see https://pre-commit.ci

Co-authored-by: Thomas Saupe <39156931+brownbaerchen@users.noreply.github.com>

for more information, see https://pre-commit.ci

* First small cleanup * Another small simplification

ClaudiaComito mentioned this pull request Mar 25, 2022

fix #925: ht.nonzero() returns tuple of 1-D arrays instead of n-D arrays #937

Merged

4 tasks

This was referenced Aug 30, 2022

[Bug]: Indexing with 0-dimensional key #1019

Open

[Bug]: Slice error when array contains an axis of length 0 #1012

Open

ClaudiaComito and others added 27 commits December 12, 2022 12:17

Update ubuntu

27ea911

[pre-commit.ci] auto fixes from pre-commit.com hooks

d0fb6c8

for more information, see https://pre-commit.ci

switch back to ubuntu 20.04

0e704d4

pull

f5d7850

Upgrade CI to ubuntu 22.04 and cuda 11.7.1

acfe9bd

avoid unnecessary gathering of test DNDarrays

0fd3d87

early out for resplit of non-distributed DNDarrays

3c4c07c

match split of comparison array to expected output

989e0f4

avoid MPI calls in non-distributed cases

6d66fad

avoid MPI calls in non-distributed resplit

a37b4d3

set default to None

8eebe10

remove print statement

22c5c68

upgrade torch version

c692bff

copy to cpu before comparing

df6a4e5

use ht.allclose instead of np.allclose

af0e721

cast different dtype operands to promoted dtype within torch call

bac6d4e

compare local tensors to corresponding slice of expected_array only

c0c6362

expand tests

587bc05

remove redundant code

24239a1

Implement slicing with negative step

cd65b37

test slicing with negative step

86e8801

merge branch bugs/#1057-Allgatherv-contiguity-mismatch

6779010

Fix single-element indexing within mixed-type key

3b1f46d

Non-ordered indexing, split != 0

1a4bf97

generalize negative step slicing to all splits, loss of dims

9e42156

loop over active ranks only when key in descending order

1a310a9

replace list-on-list mapping with argsort mapping for non-ordered key

c2ba0d9

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

86ecb5c

brownbaerchen mentioned this pull request Jun 2, 2026

nonzero, where fixes from #938 #2332

Open

7 tasks

brownbaerchen reviewed Jun 3, 2026

View reviewed changes

ClaudiaComito and others added 16 commits June 8, 2026 09:07

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

9feaf97

Reintegrate input sanitation in nonzero

2d5502e

[pre-commit.ci] auto fixes from pre-commit.com hooks

6af95c2

for more information, see https://pre-commit.ci

Apply suggestions from code review

937bc18

Co-authored-by: Thomas Saupe <39156931+brownbaerchen@users.noreply.github.com>

Remove edits

1ec9543

bring back to original state

6bfb650

[pre-commit.ci] auto fixes from pre-commit.com hooks

199518a

for more information, see https://pre-commit.ci

Refactor distr_mask_fast_path

eaa34eb

* First small cleanup * Another small simplification

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

350aaf8

remove legacy indexing leftovers

23ab286

remove orphaned functions after refactoring

b3bf485

better name and docstring for tafkaprocessed_key

c803588

move resolve_indexing_state out of Class

442d83e

rename dedup and move out of class

49406ab

introduce Indexer type alias

61b3799

introduce Indexer type alias

518d023

ClaudiaComito removed the benchmark PR label Jun 12, 2026

ClaudiaComito added 10 commits June 12, 2026 11:26

edits for readability

ce60526

remove dead code

3fc2666

update getitem docstring

012b502

update getitem docstring

cffe445

update setitem docstring

9623a20

add docstrings for helper functions

66b1a69

add docstrings for helper functions

02a4dc1

fix split axis bookkeeping

8622766

revert split bookkeeping fix

fb7c5ba

add indexing documentation

eda3789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand distributed indexing, match numpy indexing scheme#938

Expand distributed indexing, match numpy indexing scheme#938
ClaudiaComito wants to merge 283 commits into
mainfrom
914_adv-indexing-outshape-outsplit

ClaudiaComito commented Mar 24, 2022 •

edited

Loading

Uh oh!

brownbaerchen Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ClaudiaComito commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Routing logic

Main changes

Memory footprint

Scaling behaviour

Changes proposed:

Type of change

Memory requirements

Performance

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

Uh oh!

brownbaerchen Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ClaudiaComito commented Mar 24, 2022 •

edited

Loading