Description
Currently UMAP.fit_transform
allocates an embedding
array here and passes this to the c++ layer as an output parameter.
The embeddings are of size n_rows * n_components
, which may take up a non-negligible amount of device memory. These embeddings are not filled in until the final stage in the algorithm, but take up device memory for an entire run.
A different way of handling this would be to have the C++ layer allocate the device array and return it. The python layer could then wrap this as UnownedMemory
.
This wouldn't reduce peak device memory usage in UMAP (that currently happens in the embedding step, when we'd need the embedding array anyway), but it would reduce memory usage during e.g. the knn graph building step, which would let users use a lower nnd_n_clusters
value for large datasets.