-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix splay tree amortized time issues (resolves #923) #924
base: master
Are you sure you want to change the base?
Conversation
Fixes two bugs in the splay tree implementation: 1. Not splaying on failed searches or redundant insertions can break the O(log n) amortized time guarantee. Fix: Always splay at the end of search_node! 2. Deletion did not check that the node returned from search_node! had the desired key, so trying to delete keys not in the splay tree could cause it to accidentally delete unrelated keys. Fix: Check node.data for the return value of search_node!, as is done already for other operations. I've also added a regression test for bug (2).
@@ -126,7 +126,7 @@ function _join!(tree::SplayTree, s::Union{SplayTreeNode, Nothing}, t::Union{Spla | |||
end | |||
end | |||
|
|||
function search_node(tree::SplayTree{K}, d::K) where K | |||
function search_node!(tree::SplayTree{K}, d::K) where K |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a thought. And perhaps now I recall the reason why I didn't splay while searching.
Making search_node
as a modifying function essentially means we are modifying the tree on every access. This means simple lookup using in
or haskey
- all such functions need to be appended with !
indicating the change in stucture of the underlying tree. This will be a breaking change (and quite rightly so).
The context for use of splay trees implies you want recently accessed elements near to the root, so that the subsequent access takes less time.
@oxinabox are we okay for introducing a breaking change in the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes into the master branch that is fine. We are still prepping 0.19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in
and haskey
seem to be defined on Base
--- do you suggest changing them to in!
and haskey!
everywhere (i.e., even for other data structures)? Alternatives I could imagine:
- Have special
!
versions just for Splay trees. - Make every data structure have both a
!
and non-!
version of those methods, but for Splay trees, the non-!
one doesn't update the structure and doesn't guarantee the amortized time bounds. - Just splay tree has both
!
and non-!
versions; everything else has only non-!
- Leave without a
!
because (other than timing and maybe traversal?) the changes aren't really user-visible.
Also, might be beating a dead horse here, but it seems to me this !
rule was already violated in the existing version, because haskey
would already modify the tree on a successful search; all that changes in this PR should be that it modifies more frequently.
is_found = (node.data == d) | ||
is_found && splay!(tree, node) | ||
return is_found | ||
return (node.data == d) | ||
end | ||
end | ||
|
||
Base.in(key, tree::SplayTree) = haskey(tree, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline with my above comment, every function calling search_node!
becomes modifying in nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the results for this benchmark suite? Do post on the main PR.
Thanks for the feedback, everyone! Regarding benchmark results, here they are on my machine for the current Notice I'll post final benchmark results for this PR after resolving the other issues, but early results it seems like it brings the splay tree performance into the millisecond range as well. |
Per discussion in #923, this PR ensures that the splay tree's
search_node!
now always splays the last-seen node. This means thatpush!
will reorganize the tree even if the push is already present, andhaskey
will reorganize the tree even if the key is not present. This resolves a performance issue. I also added some basic tree benchmarks.In addition, I think I found a correctness bug in
delete!
, where it doesn't check that the node returned bysearch_node!
actually has the requested key (search_node!
seems to return the last visited node during the traversal, whether or not it was actually the node we searched for). I fixed this and added a regression test. You can check that it fails on the old splay tree implementation using this branch: https://github.com/matthewsot/DataStructures.jl/tree/regressionA few things that came up in the process that I'm not 100% sure how you all want to handle:
search_node
was part of the public API, but now it's been renamed tosearch_node!
for splay trees as suggested in the discussion in Splay tree amortized time issues #923, which could be a breaking change. Is that intended?splay_tree.jl
had a few places using== nothing
and a few other places using=== nothing
. I'm not totally sure what the difference is, so I tried to basically just do whatever the most similar code was doing before.Feedback welcome!