-
Notifications
You must be signed in to change notification settings - Fork 568
UMAP fixes #6316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.04
Are you sure you want to change the base?
UMAP fixes #6316
Conversation
I also reviewed the |
@@ -315,8 +313,6 @@ CUML_KERNEL void optimize_batch_kernel(T const* head_embedding, | |||
auto grad_d = T(0.0); | |||
if (repulsive_grad_coeff > T(0.0)) | |||
grad_d = clip<T>(repulsive_grad_coeff * (current[d] - negative_sample[d]), T(-4.0), T(4.0)); | |||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still curious about this change- I want to make sure we're not sacrificing the numerical stability by changing this based on limited evidence. What is the reason for changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clear mistake here.
Here is the reference implementation : https://github.com/lmcinnes/umap/blob/a012b9d8751d98b94935ca21f278a54b3c3e1b7f/umap/layouts.py#L181
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viclafargue I would not assume this is a mistake because it differs from the reference impl. The algorithms differ slightly in several ways. When we make these types of changes we should be validating and verifying them with tests. This is how more bugs can be introduced inadvertently.
Yes, that's exactly why. I've verified this many times in the past and it should be equivalent w/ this logic. We don't remove the self-loop until later to save memory. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## branch-25.04 #6316 +/- ##
===============================================
Coverage ? 67.07%
===============================================
Files ? 202
Lines ? 13076
Branches ? 0
===============================================
Hits ? 8771
Misses ? 4305
Partials ? 0 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding this until we verify
What's the status of this here? It seems like the parameter forwarding changes in the |
Agreed on splitting. The whole cuVS Team Is consumed with GTC, so won’t happen in the next two weeks. Any changes to the loss functions need to be heavily scrutinized and well tested for their impacts to the viz and recall (more than just pointing to the cpu version because there’s several places we do things differently and that’s intentional. Please also feel free to reach out to me if the question of correctness ever arises in contexts like this. Let’s separate the two changes for now. |
Thanks for the quick reply. Makes sense to me, I've taken the liberty of splitting the non-controversial changes out in #6417. |
Previously these parameters weren't fully forwarded properly to `libcuml`. Split out from #6316 (thanks Victor!) Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) URL: #6417
No description provided.