Skip to content

Allow different types for the index storage #100

@CarpeNecopinum

Description

@CarpeNecopinum

Using NearestNeighbors.jl, I found that often times I need just the positions of the points and am not interested in the actual indices and/or I'm okay with reordering all of my dataset to suit the KDTree better (but the dataset itself being too large for me being comfortable with the KDTree storing a copy of it).

I can get that by building a tree, reordering my data and assembling a new tree with the reordered data and indices collect(eachindex(xs)):

tree = KDTree(xs, storedata = false)
xs .= xs[tree.indices]
tree1 = KDTree(xs, tree.hyper_rec, collect(eachindex(xs)), tree.metric, tree.nodes, tree.tree_data, true)

That however still leaves me with a quite large (and kind of useless) array of indices.

So I altered the KDTree to accept other types for indices:

struct KDTree{V <: AbstractVector,M <: MinkowskiMetric,T,TIV} <: NNTree{V,M}
    data::Vector{V}
    hyper_rec::HyperRectangle{T}
    indices::TIV
    metric::M
    nodes::Vector{KDNode{T}}
    tree_data::TreeData
    reordered::Bool
end

For one, this allows me to replace the Vector{Int} by a UnitRange{Int} which drastically cuts memory use.

Further more, I can use something like

struct IdentityArray end
Base.getindex(::IdentityArray, index) = index

tree3 = KDTree(xs, tree.hyper_rec, IdentityArray(), tree.metric, tree.nodes, tree.tree_data, true)

to get a good bit of extra performance as well (tested with 100M Vec3f0's):

@btime inrange(tree1, p, r) # ~ 38ms, tree with `Vector{Int}` for indices
@btime inrange(tree2, p, r) # ~ 34ms, tree with `eachindex(xs)` for indices
@btime inrange(tree3, p, r) # ~ 32ms, tree with `IdentityArray()` for indices

If desired, I can cook up a proper PR for this, just wanted to test the waters on how much interest for a change like this exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions