Skip to content

Add OpenMP threading to search #6284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jmackay2
Copy link
Contributor

Updated benchmark:

-------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations
-------------------------------------------------------------------------------------
OrganizedNeighborSearch                          26.0 us         26.0 us        35446
KdTree                                           52.1 us         52.2 us        10000
KdTreeAll/iterations:1/manual_time                282 us     58906299 us            1 items_per_second=743.246M/s
KdTreeAllThreaded/iterations:1/manual_time       74.4 us      2626656 us            1 items_per_second=2.81404G/s
KdTreeNanoflann                                  42.6 us         42.5 us        19114

@roncapat
Copy link
Contributor

Hi, just a quick question, setting thread number = 1 will not spawn OpenMP worker threads? Like, exactly the same behaviour as before, with OpenMP optinal?

@mvieth
Copy link
Member

mvieth commented May 30, 2025

Hi, just a quick question, setting thread number = 1 will not spawn OpenMP worker threads? Like, exactly the same behaviour as before, with OpenMP optinal?

I am not sure if there is a simple yes/no answer. Keeping OpenMP optional in PCL is definitely the goal, you can even opt-out completely while building PCL: https://github.com/PointCloudLibrary/pcl/blob/master/CMakeLists.txt#L309
I would assume that OpenMP implementations only spawn new threads if they are needed, however I do not know what the OpenMP standard dictates and what is implementation defined.
If I remember correctly, I have seen that some compilers introduce some kind of "wrapper function" around code parts that are parallellized (even with a single thread), so on the binary level, using OpenMP with a single thread does not seem to be 100% the same as not using OpenMP at all.
In the end, it depends on why you are asking?

@roncapat
Copy link
Contributor

In the end, it depends on why you are asking?

Just to understand whether by upgrading PCL I would incur in having OpenMP forcibly baked in or not. I recall for NormalEstimation fo example there exist an OMP-suffixed version, and user can choose. In this PR, this does not seem to be the case so I was just wandering the reason for this different approach.

@roncapat
Copy link
Contributor

But as you say, building PCL without OpenMP support will just "drop" the OpenMP pragmas, right? So there is a solution to avoid OpenMP, at build stage if I'm not mistaken.

@mvieth
Copy link
Member

mvieth commented May 30, 2025

Just to understand whether by upgrading PCL I would incur in having OpenMP forcibly baked in or not. I recall for NormalEstimation fo example there exist an OMP-suffixed version, and user can choose. In this PR, this does not seem to be the case so I was just wandering the reason for this different approach.

Yes, in the recent past we switched to adding OpenMP to existing classes, instead of having two classes doing the same thing, one using OpenMP and one not (too much code duplication and size bloating in this approach). There have even been some considerations about merging e.g. NormalEstimation and NormalEstimationOMP into one class (which then makes use of OpenMP).

But as you say, building PCL without OpenMP support will just "drop" the OpenMP pragmas, right? So there is a solution to avoid OpenMP, at build stage if I'm not mistaken.

Exactly, by passing -DWITH_OPENMP=OFF to CMake, it will not search for or link to OpenMP, thus the compiler will not "know" the OpenMP pragmas and ignore them.

@roncapat
Copy link
Contributor

Thank you very much for clarifications :)

@jmackay2 jmackay2 marked this pull request as ready for review June 2, 2025 11:17
@larshg larshg added this to the pcl-1.16.0 milestone Jun 3, 2025
@mvieth mvieth added changelog: enhancement Meta-information for changelog generation module: search module: benchmarks labels Jun 3, 2025
@mvieth
Copy link
Member

mvieth commented Jun 3, 2025

@jmackay2 Out of interest, in the benchmark above, how many threads did you use (what does omp_get_num_procs() return on your computer)? In your results, KdTreeAllThreaded is about 3.8 times faster than KdTreeAll.

@jmackay2
Copy link
Contributor Author

jmackay2 commented Jun 4, 2025

@jmackay2 Out of interest, in the benchmark above, how many threads did you use (what does omp_get_num_procs() return on your computer)? In your results, KdTreeAllThreaded is about 3.8 times faster than KdTreeAll.

Yeah, good question. This benchmark is on 12 threads on my machine.

@mvieth
Copy link
Member

mvieth commented Jun 4, 2025

Yeah, good question. This benchmark is on 12 threads on my machine.

In the past, when parallelizing loops that call nearestKSearch or radiusSearch, we have often observed that the work is not distributed evenly between the threads (when using the default static schedule) because the loop iterations need different amounts of time. See for example #6155 . If you like, you could try something similar here, but if not, that's fine as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Meta-information for changelog generation module: benchmarks module: search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants