Open
Description
When building a KNN graph we have the option to set include_self=False
to avoid including the edge between a point and itself (as its distance is always 0).
The current filtering method drops the first column of the output.
cuml/python/cuml/cuml/neighbors/nearest_neighbors.pyx
Lines 689 to 700 in b4e3205
However, this fails if there is a tie of distances (where there are multiple 0 edges)
import numpy as np
from sklearn.neighbors import kneighbors_graph
from cuml.neighbors import kneighbors_graph as cuKNN
X = np.array([
[1, 5],
[1, 5],
[7, 3],
[9, 6],
[10, 1]
])
knn_graph = kneighbors_graph(X, 2, mode='connectivity', include_self=False)
print(knn_graph.toarray())
print("")
knn_graph = cuKNN(X, 2, mode='connectivity', include_self=False)
print(knn_graph.toarray())
[[0. 1. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[0. 0. 0. 1. 1.]
[0. 0. 1. 0. 1.]
[0. 0. 1. 1. 0.]]
# cuml output is different with a 1 on the top left
[[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[0. 0. 0. 1. 1.]
[0. 0. 1. 0. 1.]
[0. 0. 1. 1. 0.]]