Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix splay tree amortized time issues (resolves #923) #924

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions benchmark/bench_trees.jl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the results for this benchmark suite? Do post on the main PR.

Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
module BenchTrees

using DataStructures
using BenchmarkTools
using Random

suite = BenchmarkGroup()

rand_setup = (
Random.seed!(1234);
idx1 = rand(1:30000, 1000);
didx1 = rand(1:1000, 500);
)

# insert a bunch of keys, search for all of them, delete half of them randomly,
# then search for all of them again.
function test_rand(T)
t = T{Int}()
for i in idx1
push!(t, i)
end
for i in idx1
haskey(t, i)
end
for i in didx1
delete!(t, idx1[i])
end
for i in idx1
haskey(t, i)
end
end

# insert 1, ..., N, then push 1 many times. tests a regression from an older
# splay tree implementation where splays didn't happen on redundant pushes.
function test_redundant(T, N=100000)
t = T{Int}()
for i in 1:N
push!(t, i)
end
for i in 1:N
push!(t, 1)
end
end

# insert 1, ..., N, then access element 1 for N iterations. splay trees should
# perform best here.
function test_biased(T, N=100000)
t = T{Int}()
for i in 1:N
push!(t, i)
end
for _ in 1:N
haskey(t, 1)
end
end

trees = Dict("splay" => SplayTree, "avl" => AVLTree, "rb" => RBTree)
for (T_name, T) in trees
suite[T_name]["rand"] =
@benchmarkable test_rand($(T)) setup=rand_setup
suite[T_name]["redundant"] =
@benchmarkable test_redundant($(T))
suite[T_name]["biased"] =
@benchmarkable test_biased($T)
end

end # module

BenchTrees.suite
1 change: 1 addition & 0 deletions benchmark/benchmarks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ const SUITE = BenchmarkGroup()

SUITE["heap"] = include("bench_heap.jl")
SUITE["SparseIntSet"] = include("bench_sparse_int_set.jl")
SUITE["trees"] = include("bench_trees.jl")
3 changes: 1 addition & 2 deletions src/DataStructures.jl
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,8 @@ module DataStructures
export DiBitVector

export RBTree, search_node, minimum_node
export SplayTree, maximum_node
export SplayTree, search_node!, maximum_node
export AVLTree, sorted_rank
export SplayTree, maximum_node

export findkey

Expand Down
20 changes: 9 additions & 11 deletions src/splay_tree.jl
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ function _join!(tree::SplayTree, s::Union{SplayTreeNode, Nothing}, t::Union{Spla
end
end

function search_node(tree::SplayTree{K}, d::K) where K
function search_node!(tree::SplayTree{K}, d::K) where K
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a thought. And perhaps now I recall the reason why I didn't splay while searching.

Making search_node as a modifying function essentially means we are modifying the tree on every access. This means simple lookup using in or haskey - all such functions need to be appended with ! indicating the change in stucture of the underlying tree. This will be a breaking change (and quite rightly so).

The context for use of splay trees implies you want recently accessed elements near to the root, so that the subsequent access takes less time.

@oxinabox are we okay for introducing a breaking change in the package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes into the master branch that is fine. We are still prepping 0.19

Copy link
Author

@matthewsot matthewsot Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in and haskey seem to be defined on Base --- do you suggest changing them to in! and haskey! everywhere (i.e., even for other data structures)? Alternatives I could imagine:

  • Have special ! versions just for Splay trees.
  • Make every data structure have both a ! and non-! version of those methods, but for Splay trees, the non-! one doesn't update the structure and doesn't guarantee the amortized time bounds.
  • Just splay tree has both ! and non-! versions; everything else has only non-!
  • Leave without a ! because (other than timing and maybe traversal?) the changes aren't really user-visible.

Also, might be beating a dead horse here, but it seems to me this ! rule was already violated in the existing version, because haskey would already modify the tree on a successful search; all that changes in this PR should be that it modifies more frequently.

node = tree.root
prev = nothing
while node != nothing && node.data != d
Expand All @@ -137,33 +137,31 @@ function search_node(tree::SplayTree{K}, d::K) where K
node = node.leftChild
end
end
return (node == nothing) ? prev : node
last = (node == nothing) ? prev : node
(last == nothing) || splay!(tree, last)
return last
end

function Base.haskey(tree::SplayTree{K}, d::K) where K
node = tree.root
if node === nothing
return false
else
node = search_node(tree, d)
node = search_node!(tree, d)
(node === nothing) && return false
is_found = (node.data == d)
is_found && splay!(tree, node)
return is_found
return (node.data == d)
end
end

Base.in(key, tree::SplayTree) = haskey(tree, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline with my above comment, every function calling search_node! becomes modifying in nature.


function Base.delete!(tree::SplayTree{K}, d::K) where K
node = tree.root
x = search_node(tree, d)
(x == nothing) && return tree
x = search_node!(tree, d)
(x == nothing || x.data != d) && return tree
t = nothing
s = nothing

splay!(tree, x)

if x.rightChild !== nothing
t = x.rightChild
t.parent = nothing
Expand All @@ -183,7 +181,7 @@ end

function Base.push!(tree::SplayTree{K}, d0) where K
d = convert(K, d0)
is_present = search_node(tree, d)
is_present = search_node!(tree, d)
if (is_present !== nothing) && (is_present.data == d)
return tree
end
Expand Down
32 changes: 27 additions & 5 deletions test/test_splay_tree.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,28 @@
@test length(t) == 100
end

@testset "deleting missing values" begin
t = SplayTree{Int}()
for i in 1:100
push!(t, i)
end
for i in 101:200
delete!(t, i)
end

@test length(t) == 100

for i in 1:100
@test haskey(t, i)
end

for i in 101:200
push!(t, i)
end

@test length(t) == 200
end

@testset "handling different cases of delete!" begin
t2 = SplayTree()
for i in 1:100000
Expand Down Expand Up @@ -84,16 +106,16 @@
@test !in('c', t4)
end

@testset "search_node" begin
@testset "search_node!" begin
t5 = SplayTree()
for i in 1:32
push!(t5, i)
end
n1 = search_node(t5, 21)
n1 = search_node!(t5, 21)
@test n1.data == 21
n2 = search_node(t5, 35)
n2 = search_node!(t5, 35)
@test n2.data == 32
n3 = search_node(t5, 0)
n3 = search_node!(t5, 0)
@test n3.data == 1
end

Expand Down Expand Up @@ -131,4 +153,4 @@
node = node.rightChild
end
end
end
end
Loading