Implement CUDA multipass for knn > GPU_MAX_SELECTION_K #7381
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type
knn_searchabnormal behavior whenknn > 2048using GPU, return all 0 or very large random integer array #7301Motivation and Context
The KNN search on GPU breaks silently when the k value is larger than the macro
GPU_MAX_SELECTION_K, resulting in a trash output (all 0s, large indices > number of total points, or even negative indices). The macroGPU_MAX_SELECTION_Kis 2048 ifCUDA_VERSION > 9000, otherwise it is 1024. On CPU, the KNN search obviously has no such limits. To improve the GPU KNN search without altering the macroGPU_MAX_SELECTION_K, a multipass algorithm should be implemented, splitting the KNN search into batches where each batch size is <GPU_MAX_SELECTION_K.Checklist:
python util/check_style.py --applyto apply Open3D code styleto my code.
updated accordingly.
results (e.g. screenshots or numbers) here.
Description
I have implemented a multipass algorithm to find large KNN on CUDA, splitting the search into multiple batches not larger than
GPU_MAX_SELECTION_K. The main challenge is to mask indices that have already been found in earlier passes/iterations, taking care of tiling and contiguousness.To improve readability, I have separated the function into two distinct functions, depending on whether or not the multipass algorithm should be used:
Open3D/cpp/open3d/core/nns/KnnSearchOps.cu
Lines 535 to 543 in c0a4fcb
I have created a script with 120 test cases to test the change with different cases (small/large clouds up to 2 million points, multiple queries, small/very large knn up to 50000). This PR passes all the tests, while the original master branch code does not: cuda_knn_test.py