-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Because the graph for Leiden/Louvain clustering is created from the edge list (using cugraph.Graph.from_cudf_edgelist
), if there are any isolated vertices in the adjacency matrix, they won't be included, leading to an error (since the resulting array of labels will have the wrong shape). Here is a toy example:
import numpy as np
import scipy as sp
import cudf
import cugraph
N = 1000
# Generate random CSR matrix, symmetrise, set diagonal to 0
conn = sp.sparse.random(N, N, density=0.1, format='csr', dtype=np.float64, rng=1234)
conn = (conn + conn.T) / 2
conn.setdiag(0.0)
# Disconnect one arbitrary vertex, eliminate zeros
conn = conn.tolil()
conn[123, :] = 0.0
conn[:, 123] = 0.0
conn = conn.tocsr()
conn.eliminate_zeros()
# Create graph as in rapids_singlecell.tools._clustering._create_graph
sources, targets = conn.nonzero()
weights = conn[sources, targets].A1
df = cudf.DataFrame({"source": sources, "destination": targets, "weights": weights})
g = cugraph.Graph()
g.from_cudf_edgelist(
df, source="source", destination="destination", weight="weights",
)
# Leiden clustering
leiden_parts, q = cugraph.leiden(
g, resolution=1.0, random_state=1234, max_iter=100,
)
leiden_parts.shape[0] # returns 999
In general, leiden_parts.shape[0]
will be N - #(isolated vertices), as one would expect.
[Edit Oct 12, 2025: I was missing the obvious solution, which is simply to pass the vertex list to cugraph
via the vertices
keyword in from_cudf_edgelist
. Deleting other suggestions.]
If you'd like me to do some more thorough testing and/or draft a PR, let me know (I know this is somewhat esoteric since most people will just use the UMAP connectivities, which shouldn't have any isolated cells, but alternative weightings, such as Jaccard coefficients, can sometimes produce them).