Skip to content

[ME library] CPU mode bug #91

@Temigo

Description

@Temigo

The full chain ME code is currently unreliable in CPU mode. We rely on the coordinate ordering to perform many operations, e.g. ghost masking or semantic segmentation masking. The crucial assumption that the coordinate ordering is conserved throughout the full chain operations is verified on GPU, but breaks down on CPU.

Specifically, it starts breaking down in PPN (specifically models/layers/common/ppnplus.py in the class AttentionMask) and strongly suspected in GraphSpice as well.

How to reproduce the "bug"

This is the shortest minimal example that I could come up with.

import numpy as np
import MinkowskiEngine as ME
import torch

# Parameters
N = 10
device = 'cuda:0' # change this to 'cpu' to see the difference

# Create x
feats = torch.rand(N, 1).to(device)
coords = torch.cat([torch.zeros((N, 1)), torch.rand(N, 3) * 100], dim=1).to(device)
x = ME.SparseTensor(features=feats, coordinates=coords )

# Create mask
mask = (torch.rand(N, 6) > 0.5).float().to(device)
mask = ME.SparseTensor(
    coordinates=x.C,
    features=mask,
    coordinate_manager=x.coordinate_manager,
    tensor_stride=x.tensor_stride,
)

# Create x0
x0 = ME.SparseTensor(
    coordinates=x.C,
    features=torch.zeros(x.F.shape[0], mask.F.shape[1]).to(device),
    coordinate_manager=x.coordinate_manager,
    tensor_stride=x.tensor_stride
)

Now you can compare the coordinate tensors obtained through the .C attribute and the order will change after the addition x0+mask :

print(x.C, mask.C, x0.C ) # These are all identical
# No a priori reason but this set of coordinates is ordered differently on CPU, and identical to the previous one on GPU
print((mask + x0).C)

What does MinkowskiEngine say?

Well, they do not guarantee the coordinate ordering. See
https://github.com/NVIDIA/MinkowskiEngine/blob/master/MinkowskiEngine/MinkowskiTensor.py#L291

The order of coordinates is non-deterministic within each batch.
Use :attr:decomposed_coordinates_and_features to retrieve
both coordinates features with the same order. To retrieve the
order the decomposed coordinates is generated, use :attr:decomposition_permutations.

(I have to say, it is not 100% clear to me what decomposition_permutations is for. But it definitely does not allow to retrieve the original coordinate ordering. (still would be cumbersome to have to correct every now and then in the code))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions