-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I'm not entirely certain that this actually qualifies as a bug (as opposed to merely a limitation), and even less certain that it qualifies as a rapids_singlecell
bug instead of a cuML one, but it is an undesirable behaviour.
For some metrics (e.g. cosine
or correlation
), cuML (v25.08) can return non-zero self-distances:
import cupy as cp
import cuml
rng = cp.random.default_rng(1234)
arr = rng.standard_normal(size=(5000, 50))
nn_obj = cuml.neighbors.NearestNeighbors(n_neighbors=30, algorithm='brute', metric='cosine')
nn_obj.fit(arr)
distances, neighbors = nn_obj.kneighbors(arr, n_neighbors=30, return_distance=True)
distances[:, 0].max() # != 0; ~5e-07 in my tests on an H100
Obviously, this will affect downstream calculations, such as UMAP connectivities. Would it make sense to actively zero out self-distances after they're calculated here? I can't see a downside to inserting knn_dist[:, 0] = 0.0
, but maybe I'm missing something...
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working