Skip to content

UMAP fixes #6316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: branch-25.04
Choose a base branch
from
Open

UMAP fixes #6316

wants to merge 1 commit into from

Conversation

viclafargue
Copy link
Contributor

No description provided.

@viclafargue viclafargue requested review from a team as code owners February 13, 2025 15:34
@github-actions github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Feb 13, 2025
@viclafargue viclafargue added bug Something isn't working non-breaking Non-breaking change labels Feb 13, 2025
@viclafargue
Copy link
Contributor Author

viclafargue commented Feb 13, 2025

I also reviewed the smooth_knn_dist_kernel kernel and noticed that one of the loops doesn't account for the first element. Is this because the first distance is expected to be zero (self) in a pairwise search? If so, is this safe for transformations? Despite this, the membership strength values seem to be reasonably in accordance with the KNN dists. However, it might be worth addressing if this turns out to be a genuine issue.

@@ -315,8 +313,6 @@ CUML_KERNEL void optimize_batch_kernel(T const* head_embedding,
auto grad_d = T(0.0);
if (repulsive_grad_coeff > T(0.0))
grad_d = clip<T>(repulsive_grad_coeff * (current[d] - negative_sample[d]), T(-4.0), T(4.0));
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still curious about this change- I want to make sure we're not sacrificing the numerical stability by changing this based on limited evidence. What is the reason for changing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear mistake here.
Here is the reference implementation : https://github.com/lmcinnes/umap/blob/a012b9d8751d98b94935ca21f278a54b3c3e1b7f/umap/layouts.py#L181

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viclafargue I would not assume this is a mistake because it differs from the reference impl. The algorithms differ slightly in several ways. When we make these types of changes we should be validating and verifying them with tests. This is how more bugs can be introduced inadvertently.

@cjnolet
Copy link
Member

cjnolet commented Feb 13, 2025

Is this because the first distance is expected to be zero (self) in a pairwise search? If so, is this safe for transformations?

Yes, that's exactly why. I've verified this many times in the past and it should be equivalent w/ this logic. We don't remove the self-loop until later to save memory.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (branch-25.04@9c0166a). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-25.04    #6316   +/-   ##
===============================================
  Coverage                ?   67.07%           
===============================================
  Files                   ?      202           
  Lines                   ?    13076           
  Branches                ?        0           
===============================================
  Hits                    ?     8771           
  Misses                  ?     4305           
  Partials                ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jcrist jcrist mentioned this pull request Feb 28, 2025
9 tasks
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding this until we verify

@jcrist
Copy link
Member

jcrist commented Mar 10, 2025

What's the status of this here? It seems like the parameter forwarding changes in the .pyx files are obvious bugfixes and should definitely be merged, while the cuda change is a maybe per @cjnolet. If possible, could we split the former out just to ensure it gets in for 25.04? @cjnolet, do you think you (or someone) would have time to verify the latter in the next few weeks?

@cjnolet
Copy link
Member

cjnolet commented Mar 11, 2025

Agreed on splitting. The whole cuVS Team Is consumed with GTC, so won’t happen in the next two weeks. Any changes to the loss functions need to be heavily scrutinized and well tested for their impacts to the viz and recall (more than just pointing to the cpu version because there’s several places we do things differently and that’s intentional. Please also feel free to reach out to me if the question of correctness ever arises in contexts like this. Let’s separate the two changes for now.

@jcrist
Copy link
Member

jcrist commented Mar 11, 2025

Thanks for the quick reply. Makes sense to me, I've taken the liberty of splitting the non-controversial changes out in #6417.

rapids-bot bot pushed a commit that referenced this pull request Mar 11, 2025
Previously these parameters weren't fully forwarded properly to `libcuml`.

Split out from #6316 (thanks Victor!)

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: #6417
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CUDA/C++ Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants