Skip to content

Conversation

@nicolaloi
Copy link
Contributor

@nicolaloi nicolaloi commented Dec 8, 2025

Type

Motivation and Context

The KNN search on GPU breaks silently when the k value is larger than the macro GPU_MAX_SELECTION_K, resulting in a trash output (all 0s, large indices > number of total points, or even negative indices). The macro GPU_MAX_SELECTION_K is 2048 if CUDA_VERSION > 9000, otherwise it is 1024. On CPU, the KNN search obviously has no such limits. To improve the GPU KNN search without altering the macro GPU_MAX_SELECTION_K, a multipass algorithm should be implemented, splitting the KNN search into batches where each batch size is < GPU_MAX_SELECTION_K.

Checklist:

  • I have run python util/check_style.py --apply to apply Open3D code style
    to my code.
  • This PR changes Open3D behavior or adds new functionality.
    • Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
      updated accordingly.
    • I have added or updated C++ and / or Python unit tests OR included test
      results
      (e.g. screenshots or numbers) here.
  • I will follow up and update the code if CI fails.
  • For fork PRs, I have selected Allow edits from maintainers.

Description

I have implemented a multipass algorithm to find large KNN on CUDA, splitting the search into multiple batches not larger than GPU_MAX_SELECTION_K. The main challenge is to mask indices that have already been found in earlier passes/iterations, taking care of tiling and contiguousness.

To improve readability, I have separated the function into two distinct functions, depending on whether or not the multipass algorithm should be used:

if (knn <= GPU_MAX_SELECTION_K) {
KnnSearchCUDASinglePass<T, TIndex>(points, queries, knn, tile_rows,
tile_cols, output_allocator,
point_norms, query_norms);
} else {
KnnSearchCUDAMultiPass<T, TIndex>(points, queries, knn, tile_rows,
tile_cols, output_allocator,
point_norms, query_norms);
}

I have created a script with 120 test cases to test the change with different cases (small/large clouds up to 2 million points, multiple queries, small/very large knn up to 50000). This PR passes all the tests, while the original master branch code does not: cuda_knn_test.py

@update-docs
Copy link

update-docs bot commented Dec 8, 2025

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

@nicolaloi nicolaloi changed the title CUDA multipass for knn >= GPU_MAX_SELECTION_K CUDA multipass for knn > GPU_MAX_SELECTION_K Dec 8, 2025
@nicolaloi nicolaloi changed the title CUDA multipass for knn > GPU_MAX_SELECTION_K Implement CUDA multipass for knn > GPU_MAX_SELECTION_K Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

knn_search abnormal behavior when knn > 2048 using GPU, return all 0 or very large random integer array

1 participant