Skip to content

Conversation

@lucabart97
Copy link
Contributor

While comparing pointer-based and index-based approaches, I tested the PooledAllocator. It appears to segment memory, which can negatively affect cache locality. To address this, I experimented with storing the entire tree in a single contiguous buffer using std::vector, making the structure more cache-friendly.

I perform some tests:
a
b

I think that size_t nodes = 4 * (Base::size_ / Base::leaf_max_size_); is not the best formula to predict the Nodes size, so the build time is affected by that (?)

Suggestions are welcome :)

@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

❌ Patch coverage is 78.57143% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.38%. Comparing base (5e3026c) to head (87a62f0).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
include/nanoflann.hpp 78.57% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #274      +/-   ##
==========================================
- Coverage   86.18%   85.38%   -0.81%     
==========================================
  Files           1        1              
  Lines         666      643      -23     
  Branches      124      121       -3     
==========================================
- Hits          574      549      -25     
- Misses         92       94       +2     
Files with missing lines Coverage Δ
include/nanoflann.hpp 85.38% <78.57%> (-0.81%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@qq422216549
Copy link

static constexpr size_t BLOCKSIZE = 8192;

if you want to be more cache friendly,increase it.

use std::vector alloc large mem is not necessary.
when size > cpu l3 cache,Performance improvement is too small.

and.

size_t nodes = 4 * (Base::size_ / Base::leaf_max_size_);
nodes = std::max(nodes, Base::size_);//?????  maybe  std::min?????
Base::pool_.resize(nodes);

when leaf_max_size = 10 (default)
size_t nodes = 4 * (Base::size_ / 10) = 0.4 * Base::size_
nodes = std::max(0.4 * Base::size_, Base::size_) = Base::size_

nodes is too imprecise.
In the worst case, it could waste memory 10 times...

@lucabart97
Copy link
Contributor Author

lucabart97 commented Nov 19, 2025

Dear @qq422216549
Thank you for your feedback!

use std::vector alloc large mem is not necessary.
when size > cpu l3 cache,Performance improvement is too small.

I do not agree, for the cache spatial locality and temporally locality, and to help the ram controller, having continuous memory is the best.

We could optimize the formula, the std::max is needed for the tests, because they build tree with 5 points and leaf size is 10. I don't know if you have a better guess!

Using std::vector simplify also the code, and the query time as you can see in the previous plot, is better! (You do not need to tune anything...)

@jlblancoc
Copy link
Owner

@lucabart97 I like the overall simplification from handmade to std. Let me check it locally carefully and do an independent benchmark before taking the decision to merge, since this would have a relevant impact.

@lucabart97
Copy link
Contributor Author

@jlblancoc what I miss is a smart formula to know the number of nodes given the input and the leaf sizes

@jlblancoc
Copy link
Owner

jlblancoc commented Dec 3, 2025

@lucabart97
Ok, I see the problem: "std::vector<>" cannot grow without invalidating pointers, and the accesses right now are not safe, via the [] operator which doesn't check for out of bounds.
And pre-allocating, it's hard to guess the exact number of nodes.

My ideas:

On the general approach

Why not using std::deque instead of std::vector? I have used it a lot in the past when needing a growing, compact container that DOES NOT invalidate when growing. Would you have time to give it a try? i.e. start with a "smart formula", then grow if access is passed the current size.

On the "smart formula"

I think that the worst, worst case would be having all leafs with just one point. But we must use heuristics not to waste memory for "most common cases".
Assuming all leafs are at their maximum size (best case), and that each two nodes at a given level have one common parent at the level above (which I think is the worst case, so we have a mix), we should have:

NumberOfLeafs: n_L = ceil( PointCount / MaxLeafSize ) 

Then, going up the tree, each level has 1/2 nodes the former level, so it's a sum of a geometric series:

HeuristicNodeCount = 2 * n_L

Actually, the mathematical ideal infinite sum should end when the a level has 1 node only, but the difference is insignificant.

We could test this formula with some real dataset to check if it's close to reality. Then, together with the std::deque<> approach, it would be a robust solution. Then, it should be benchmarked to see if it's still faster than the current version, which might be possible! 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants