Skip to content

Spectral Embedding #6581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: branch-25.06
Choose a base branch
from
Draft

Conversation

aamijar
Copy link
Member

@aamijar aamijar commented Apr 24, 2025

Resolves #6486

Usage

from sklearn import datasets
from cuml.manifold import SpectralEmbedding
import cupy as cp

# (1500, 3) -> (1500, 2)
n_samples = 1500
S_points, S_color = datasets.make_s_curve(n_samples, random_state=0)

spectral = SpectralEmbedding(n_components=2, n_neighbors=None, random_state=42)
embedding = spectral.fit_transform(cp.asarray(S_points, order='C', dtype=cp.float32))


from sklearn.datasets import fetch_openml

# (70000, 784) -> (70000, 2)
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist.data, mnist.target.astype(int)

spectral = SpectralEmbedding(n_components=2, n_neighbors=None, random_state=42)
embedding = spectral.fit_transform(cp.asarray(X, order='C', dtype=cp.float32))

image

image

Copy link

copy-pr-bot bot commented Apr 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Cython / Python Cython or Python issue CMake CUDA/C++ labels Apr 24, 2025
@aamijar aamijar added non-breaking Non-breaking change feature request New feature or request labels Apr 24, 2025
@aamijar aamijar self-assigned this Apr 24, 2025
// // if (idx < size) { vec[idx] = (fabs(vec[idx]) < threshold) ? 0 : vec[idx]; }
// }

void scale_eigenvectors_by_diagonal(
Copy link
Member

@cjnolet cjnolet Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pull these primitives into RAFT. We centralize these primitives specifically to 1) reduce code duplication, and 2) highlight areas that could be further optimized in the future- optimizations of these types of computations propagate to all algorithms which use them. Having them buried in each algorithm reduces the maintainability as well as the benefits from continued optimization.

}
};

auto spectral_embedding(raft::resources const& handle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a preprocessing method in cuVS c++ API (since it's nearest neighbors + laplacian + eigendecomp). It should be exposed in cuML. We should also be allowing other options for the nearest neighbors than just brute-force.

// raft::print_device_vector("sym_coo1.cols", sym_coo1.cols(), sym_coo1.nnz, std::cout);
// raft::print_device_vector("sym_coo1.vals", sym_coo1.vals(), sym_coo1.nnz, std::cout);

raft::sparse::COO<float> sym_coo(stream); // Don't pre-allocate dimensions
Copy link
Member

@cjnolet cjnolet Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note for future use as we definitely want to make sure we are using the new raft sparse types (raft::sparse::device_coo_matrix) in public APIs (this is not a public API). We are trying to get rid of the legacy types (raft::sparse::COO).

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CUDA/C++ Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Spectral Embedding API
2 participants