Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix splay tree amortized time issues (resolves #923) #924

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matthewsot
Copy link

@matthewsot matthewsot commented Feb 22, 2025

Per discussion in #923, this PR ensures that the splay tree's search_node! now always splays the last-seen node. This means that push! will reorganize the tree even if the push is already present, and haskey will reorganize the tree even if the key is not present. This resolves a performance issue. I also added some basic tree benchmarks.

In addition, I think I found a correctness bug in delete!, where it doesn't check that the node returned by search_node! actually has the requested key (search_node! seems to return the last visited node during the traversal, whether or not it was actually the node we searched for). I fixed this and added a regression test. You can check that it fails on the old splay tree implementation using this branch: https://github.com/matthewsot/DataStructures.jl/tree/regression

A few things that came up in the process that I'm not 100% sure how you all want to handle:

  1. It seems like search_node was part of the public API, but now it's been renamed to search_node! for splay trees as suggested in the discussion in Splay tree amortized time issues #923, which could be a breaking change. Is that intended?
  2. splay_tree.jl had a few places using == nothing and a few other places using === nothing. I'm not totally sure what the difference is, so I tried to basically just do whatever the most similar code was doing before.

Feedback welcome!

Fixes two bugs in the splay tree implementation:

    1. Not splaying on failed searches or redundant insertions can break
       the O(log n) amortized time guarantee.

       Fix: Always splay at the end of search_node!

    2. Deletion did not check that the node returned from search_node!
       had the desired key, so trying to delete keys not in the splay
       tree could cause it to accidentally delete unrelated keys.

       Fix: Check node.data for the return value of search_node!, as is
       done already for other operations.

I've also added a regression test for bug (2).
@oxinabox oxinabox requested a review from eulerkochy February 24, 2025 11:51
@@ -126,7 +126,7 @@ function _join!(tree::SplayTree, s::Union{SplayTreeNode, Nothing}, t::Union{Spla
end
end

function search_node(tree::SplayTree{K}, d::K) where K
function search_node!(tree::SplayTree{K}, d::K) where K
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a thought. And perhaps now I recall the reason why I didn't splay while searching.

Making search_node as a modifying function essentially means we are modifying the tree on every access. This means simple lookup using in or haskey - all such functions need to be appended with ! indicating the change in stucture of the underlying tree. This will be a breaking change (and quite rightly so).

The context for use of splay trees implies you want recently accessed elements near to the root, so that the subsequent access takes less time.

@oxinabox are we okay for introducing a breaking change in the package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes into the master branch that is fine. We are still prepping 0.19

Copy link
Author

@matthewsot matthewsot Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in and haskey seem to be defined on Base --- do you suggest changing them to in! and haskey! everywhere (i.e., even for other data structures)? Alternatives I could imagine:

  • Have special ! versions just for Splay trees.
  • Make every data structure have both a ! and non-! version of those methods, but for Splay trees, the non-! one doesn't update the structure and doesn't guarantee the amortized time bounds.
  • Just splay tree has both ! and non-! versions; everything else has only non-!
  • Leave without a ! because (other than timing and maybe traversal?) the changes aren't really user-visible.

Also, might be beating a dead horse here, but it seems to me this ! rule was already violated in the existing version, because haskey would already modify the tree on a successful search; all that changes in this PR should be that it modifies more frequently.

is_found = (node.data == d)
is_found && splay!(tree, node)
return is_found
return (node.data == d)
end
end

Base.in(key, tree::SplayTree) = haskey(tree, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline with my above comment, every function calling search_node! becomes modifying in nature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the results for this benchmark suite? Do post on the main PR.

@matthewsot
Copy link
Author

Thanks for the feedback, everyone!

Regarding benchmark results, here they are on my machine for the current master branch (without this PR): https://github.com/matthewsot/DataStructures.jl/blob/regression/old.md

Notice ["trees", "splay", "redundant"] takes almost 45s, whereas AVL and RB are in the millisecond range.

I'll post final benchmark results for this PR after resolving the other issues, but early results it seems like it brings the splay tree performance into the millisecond range as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants