Implement CUDA multipass for knn > GPU_MAX_SELECTION_K #7381

nicolaloi · 2025-12-08T12:03:44Z

Type

Bug fix (non-breaking change which fixes an issue): Fixes knn_search abnormal behavior when knn > 2048 using GPU, return all 0 or very large random integer array #7301
New feature (non-breaking change which adds functionality). Resolves #
Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

The KNN search on GPU breaks silently when the k value is larger than the macro GPU_MAX_SELECTION_K, resulting in a trash output (all 0s, large indices > number of total points, or even negative indices). The macro GPU_MAX_SELECTION_K is 2048 if CUDA_VERSION > 9000, otherwise it is 1024. On CPU, the KNN search obviously has no such limits. To improve the GPU KNN search without altering the macro GPU_MAX_SELECTION_K, a multipass algorithm should be implemented, splitting the KNN search into batches where each batch size is < GPU_MAX_SELECTION_K.

Checklist:

I have run python util/check_style.py --apply to apply Open3D code style
to my code.
This PR changes Open3D behavior or adds new functionality.
- Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
  updated accordingly.
- I have added or updated C++ and / or Python unit tests OR included test
  results (e.g. screenshots or numbers) here.
I will follow up and update the code if CI fails.
For fork PRs, I have selected Allow edits from maintainers.

Description

I have implemented a multipass algorithm to find large KNN on CUDA, splitting the search into multiple batches not larger than GPU_MAX_SELECTION_K. The main challenge is to mask indices that have already been found in earlier passes/iterations, taking care of tiling and contiguousness.

To improve readability, I have separated the function into two distinct functions, depending on whether or not the multipass algorithm should be used:

Open3D/cpp/open3d/core/nns/KnnSearchOps.cu

Lines 535 to 543 in c0a4fcb

    
           if (knn <= GPU_MAX_SELECTION_K) { 
        
               KnnSearchCUDASinglePass<T, TIndex>(points, queries, knn, tile_rows, 
        
                                                  tile_cols, output_allocator, 
        
                                                  point_norms, query_norms); 
        
           } else { 
        
               KnnSearchCUDAMultiPass<T, TIndex>(points, queries, knn, tile_rows, 
        
                                                 tile_cols, output_allocator, 
        
                                                 point_norms, query_norms); 
        
           }

I have created a script with 120 test cases to test the change with different cases (small/large clouds up to 2 million points, multiple queries, small/very large knn up to 50000). This PR passes all the tests, while the original master branch code does not: cuda_knn_test.py

update-docs · 2025-12-08T12:03:48Z

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

cuda multipass for knn >= GPU_MAX_SELECTION_K

c0a4fcb

nicolaloi changed the title ~~CUDA multipass for knn >= GPU_MAX_SELECTION_K~~ CUDA multipass for knn > GPU_MAX_SELECTION_K Dec 8, 2025

nicolaloi changed the title ~~CUDA multipass for knn > GPU_MAX_SELECTION_K~~ Implement CUDA multipass for knn > GPU_MAX_SELECTION_K Dec 8, 2025

update CHANGELOG.md

0836ead

nicolaloi mentioned this pull request Dec 8, 2025

knn_search abnormal behavior when knn > 2048 using GPU, return all 0 or very large random integer array #7301

Open

3 tasks

OuYaozhong mentioned this pull request Dec 24, 2025

Non-deterministic for-loop during building from source code #7390

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement CUDA multipass for knn > GPU_MAX_SELECTION_K #7381

Implement CUDA multipass for knn > GPU_MAX_SELECTION_K #7381

Uh oh!

nicolaloi commented Dec 8, 2025 •

edited

Loading

Uh oh!

update-docs bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if (knn <= GPU_MAX_SELECTION_K) {
	KnnSearchCUDASinglePass<T, TIndex>(points, queries, knn, tile_rows,
	tile_cols, output_allocator,
	point_norms, query_norms);
	} else {
	KnnSearchCUDAMultiPass<T, TIndex>(points, queries, knn, tile_rows,
	tile_cols, output_allocator,
	point_norms, query_norms);
	}

Implement CUDA multipass for knn > GPU_MAX_SELECTION_K #7381

Are you sure you want to change the base?

Implement CUDA multipass for knn > GPU_MAX_SELECTION_K #7381

Uh oh!

Conversation

nicolaloi commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type

Motivation and Context

Checklist:

Description

Uh oh!

update-docs bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nicolaloi commented Dec 8, 2025 •

edited

Loading