Using NearestNeighbors.jl, I found that often times I need just the positions of the points and am not interested in the actual indices and/or I'm okay with reordering all of my dataset to suit the KDTree better (but the dataset itself being too large for me being comfortable with the KDTree storing a copy of it).
I can get that by building a tree, reordering my data and assembling a new tree with the reordered data and indices collect(eachindex(xs)):
tree = KDTree(xs, storedata = false)
xs .= xs[tree.indices]
tree1 = KDTree(xs, tree.hyper_rec, collect(eachindex(xs)), tree.metric, tree.nodes, tree.tree_data, true)
That however still leaves me with a quite large (and kind of useless) array of indices.
So I altered the KDTree to accept other types for indices:
struct KDTree{V <: AbstractVector,M <: MinkowskiMetric,T,TIV} <: NNTree{V,M}
data::Vector{V}
hyper_rec::HyperRectangle{T}
indices::TIV
metric::M
nodes::Vector{KDNode{T}}
tree_data::TreeData
reordered::Bool
end
For one, this allows me to replace the Vector{Int} by a UnitRange{Int} which drastically cuts memory use.
Further more, I can use something like
struct IdentityArray end
Base.getindex(::IdentityArray, index) = index
tree3 = KDTree(xs, tree.hyper_rec, IdentityArray(), tree.metric, tree.nodes, tree.tree_data, true)
to get a good bit of extra performance as well (tested with 100M Vec3f0's):
@btime inrange(tree1, p, r) # ~ 38ms, tree with `Vector{Int}` for indices
@btime inrange(tree2, p, r) # ~ 34ms, tree with `eachindex(xs)` for indices
@btime inrange(tree3, p, r) # ~ 32ms, tree with `IdentityArray()` for indices
If desired, I can cook up a proper PR for this, just wanted to test the waters on how much interest for a change like this exists.
Using NearestNeighbors.jl, I found that often times I need just the positions of the points and am not interested in the actual indices and/or I'm okay with reordering all of my dataset to suit the KDTree better (but the dataset itself being too large for me being comfortable with the KDTree storing a copy of it).
I can get that by building a tree, reordering my data and assembling a new tree with the reordered data and indices
collect(eachindex(xs)):That however still leaves me with a quite large (and kind of useless) array of indices.
So I altered the
KDTreeto accept other types for indices:For one, this allows me to replace the
Vector{Int}by aUnitRange{Int}which drastically cuts memory use.Further more, I can use something like
to get a good bit of extra performance as well (tested with 100M
Vec3f0's):If desired, I can cook up a proper PR for this, just wanted to test the waters on how much interest for a change like this exists.