Skip to content

nn on a single point uses allocations #227

@hughcars

Description

@hughcars

There's a special case for nn where there is a single point provided, where the output is a single index and distance. At present this causes 4 allocations, because dist and idxs are declared as Vector before calling knn! internally.

Testing with

using NearestNeighbors, StaticArrays, BenchmarkTools
import Random: seed!, default_rng, rand
rng = seed!(default_rng(), 111)
tree = KDTree(rand(rng, SVector{3, Float64}, 100))
p = SVector(rand(), rand(), rand())
@benchmark nn($tree, $p)
idx = Vector{Int}(undef, 1)
dist = Vector{Float64}(undef, 1)
@benchmark knn!($idx, $dist, $tree, $p, 1)

gives

BenchmarkTools.Trial: 10000 samples with 951 evaluations per sample.
 Range (min … max):   93.761 ns …   8.354 μs  ┊ GC (min … max): 0.00% … 98.46%
 Time  (median):     100.201 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   108.518 ns ± 175.848 ns  ┊ GC (mean ± σ):  5.85% ±  3.63%

  ▃▆▄▁▂▄▆█▇▅▅▅▄▃▃▃▃▂▁▁                                          ▂
  █████████████████████▇███▇▇██▆▆█▇▇▆▆▇▅▅▆▆▄▄▅▅▅▄▄▄▅▄▄▄▄▄▄▂▄▄▂▄ █
  93.8 ns       Histogram: log(frequency) by time        144 ns <

 Memory estimate: 128 bytes, allocs estimate: 4.

BenchmarkTools.Trial: 10000 samples with 985 evaluations per sample.
 Range (min … max):  52.157 ns … 167.216 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     53.173 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   54.438 ns ±   4.320 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █▅                                                           
  ▆██▃▂▂▃▄▄▃▄▅▅▅▅▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▁▂▂▂▂ ▃
  52.2 ns         Histogram: frequency by time         67.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

A roughly 2x speedup in the mean, and roughly 50x reduction in the standard deviation (as there's no gc anymore).

For cases where nn is going to be repeatedly called over many points, one at a time, it seems desirable to get rid of this, as a pretty significant impact on the meantime, and the variance. It would then make using nn.(tree, points) work neatly.

I suspect the cleanest way of achieving this is a bit of a refactor however, to write the Vector{Vector} interfaces in terms of this lower piece (with the broadcast for instance). I'm happy to have a go at this, but could do with some pointers if there are any "gotchas" I'm not thinking of.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions