Dispatch to use fp32 distance computation in NN Descent depending on data dimensions #1415

jinsolp · 2025-10-08T17:59:46Z

Closes #1370
Closes #195

From heuristics, chose dim=16 as the threshold for dispatching to a fp32 distance kernel.

We no longer use wmma in the fp32 kernel. Originally wmma was done on matrices of shape [64 x 32] x [32 x 64] per block (multiple iterations if data dimension is larger than 32).
We do manual computation, but since we only target small dimensions, fp32 dispatching ends up being slightly faster end to end with much better recall for small dimensions.

All number below are run on L40 machine and AMD EPYC CPU with 128 cores. Perf and recall is averaged over 5 runs and all time is in seconds. Baseline knn graph is computed using sklearn.neighbors.NearestNeighbors brute for method.

Max iters=20

For larger dimensions there is an inherent issue with the NN Descent algorithm itself that makes the recall low. This can be improved slightly with more iterations.
Also notice that the e2e time taken is similar or slightly less for using fp32.

Max iters=100

Notice how the blue part, the recall doesn't get better compared to the table above even with more iterations (i.e. why we need the fp32 appraoch for this part)

jinsolp added 4 commits October 7, 2025 19:24

merge commit and changes

a6b1bc7

rm header

1c7976a

Merge branch 'rapidsai:branch-25.12' into fix-nnd-recall-fp32

e63b316

Merge branch 'rapidsai:branch-25.12' into fix-nnd-recall-fp32

f9a7811

jinsolp self-assigned this Oct 8, 2025

jinsolp requested a review from a team as a code owner October 8, 2025 17:59

jinsolp added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 8, 2025

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Oct 8, 2025

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Oct 8, 2025

Merge branch 'main' into fix-nnd-recall-fp32

868518a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dispatch to use fp32 distance computation in NN Descent depending on data dimensions #1415

Dispatch to use fp32 distance computation in NN Descent depending on data dimensions #1415

Uh oh!

jinsolp commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Dispatch to use fp32 distance computation in NN Descent depending on data dimensions #1415

Are you sure you want to change the base?

Dispatch to use fp32 distance computation in NN Descent depending on data dimensions #1415

Uh oh!

Conversation

jinsolp commented Oct 8, 2025

Max iters=20

Max iters=100

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant