Delay allocating `embedding` array in `UMAP.fit_transform`

Currently `UMAP.fit_transform` allocates an `embedding` array [here](https://github.com/rapidsai/cuml/blob/9559f85270fc61575f76f979db7c582eee3fcd73/python/cuml/cuml/manifold/umap.pyx#L805-L807) and passes this to the c++ layer as an output parameter.

The embeddings are of size `n_rows * n_components`, which may take up a non-negligible amount of device memory. These embeddings are not filled in until the final stage in the algorithm, but take up device memory for an entire run.

A different way of handling this would be to have the C++ layer allocate the device array and return it. The python layer could then wrap this as [`UnownedMemory`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.UnownedMemory.html).

This wouldn't reduce peak device memory usage in UMAP (that currently happens in the embedding step, when we'd need the embedding array anyway), but it would reduce memory usage during e.g. the knn graph building step, which would let users use a lower `nnd_n_clusters` value for large datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delay allocating `embedding` array in `UMAP.fit_transform` #6540

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Delay allocating embedding array in UMAP.fit_transform #6540

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Delay allocating `embedding` array in `UMAP.fit_transform` #6540