Skip to content

Experiments with replacing index implementation of btree with key -> revisions with revision list of immutable trees #18184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

serathius
Copy link
Member

Recently talked with Marcel from Cillium about his recent issue with etcd being overloaded with LIST with limit. Even with low limit, the list will range all the keys to count the total number of keys (not a great feature).

My idea was to improve performance of getting key count within range by using interval trees. For now I didn't managed to implement it as I have already started seeing issues finding a good implementation of immutable tree. For now I tried with AVL trees.

The results:

name                                   old time/op    new time/op    delta
IndexPutBase-12                          81.7ns ± 2%   674.5ns ± 6%   +725.35%  (p=0.000 n=10+10)
IndexPutLongKey-12                       82.4ns ± 1%   708.7ns ± 3%   +760.02%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                  309ns ± 1%    1931ns ± 1%   +525.73%  (p=0.000 n=10+8)
IndexGetBase-12                          96.6ns ± 2%    57.7ns ± 3%    -40.24%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   700ns ±10%     383ns ± 3%    -45.24%  (p=0.000 n=10+10)
IndexGetAccurate-12                      98.2ns ± 5%    57.2ns ± 3%    -41.74%  (p=0.000 n=10+10)
IndexGetMisses-12                        94.3ns ± 3%    60.0ns ± 3%    -36.39%  (p=0.000 n=10+10)
IndexGetLongKey-12                        107ns ± 1%      67ns ± 2%    -37.77%  (p=0.000 n=8+9)
IndexGetLargeKeySpace-12                  347ns ± 1%    1135ns ± 2%   +226.70%  (p=0.000 n=10+10)
IndexRangeBase-12                         405ns ± 4%     512ns ± 2%    +26.67%  (p=0.000 n=10+8)
IndexRangeHighSelectivity-12              354ns ± 1%     452ns ± 2%    +27.61%  (p=0.000 n=10+10)
IndexRangeLongKey-12                      655ns ± 4%     616ns ± 5%     -6.00%  (p=0.001 n=10+10)
IndexRangeRepeatedKeys-12                3.94µs ± 7%    0.80µs ± 6%    -79.56%  (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12                872ns ± 1%    2072ns ± 1%   +137.70%  (p=0.000 n=10+10)
IndexRevisionsBase-12                     296ns ± 3%     391ns ± 2%    +31.90%  (p=0.000 n=10+10)
IndexRevisionsHighSelectivity-12          241ns ± 1%     327ns ± 1%    +35.55%  (p=0.000 n=10+10)
IndexRevisionsLongKey-12                  496ns ± 7%     453ns ± 5%     -8.64%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12            35.4µs ±18%     1.3µs ± 5%    -96.38%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12            702ns ± 1%    1901ns ± 1%   +170.71%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                    244µs ± 3%     253µs ± 2%     +3.75%  (p=0.000 n=10+10)
IndexCountRevisionsBase-12                219ns ± 3%     323ns ± 4%    +47.27%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     168ns ± 5%     252ns ± 5%    +49.64%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             379ns ± 1%     358ns ± 6%     -5.32%  (p=0.002 n=8+10)
IndexCountRevisionsRepeatedKeys-12       3.97µs ±14%    0.46µs ± 7%    -88.41%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       601ns ± 1%    1804ns ± 3%   +199.98%  (p=0.000 n=10+10)

Pros:

  • Writers don't block readers thanks to CopyOnWrite data structure
  • Much better performance when there are small number of keys reused as we don't lookup in list of revisions.
  • Trivial compaction thank to revision list

Cons:

  • Additional allocations for Puts make them really slow (should improve with btree)
  • Lower performance with large number of keys (should improve with btree)
  • No ready implementation of btree and interval tree
  • Not sure if tombstones can be properly implemented.

The main blocker for this approach is reducing the cost of Puts and improving the

Even if the idea doesn't stick I expect we can keep the benchmarks.

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@serathius
Copy link
Member Author

Tried the btree with copy:

name                                   old time/op    new time/op    delta
IndexPutBase-12                          81.7ns ± 2%  1044.5ns ±22%  +1178.19%  (p=0.000 n=10+10)
IndexPutLongKey-12                       82.4ns ± 1%  1077.8ns ±20%  +1207.90%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                  309ns ± 1%    3056ns ± 6%   +890.38%  (p=0.000 n=10+10)
IndexGetBase-12                          96.6ns ± 2%   139.7ns ± 1%    +44.65%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   700ns ±10%     822ns ± 8%    +17.47%  (p=0.000 n=10+10)
IndexGetAccurate-12                      98.2ns ± 5%   139.8ns ± 2%    +42.33%  (p=0.000 n=10+10)
IndexGetMisses-12                        94.3ns ± 3%   141.9ns ± 2%    +50.50%  (p=0.000 n=10+10)
IndexGetLongKey-12                        107ns ± 1%     157ns ± 3%    +46.44%  (p=0.000 n=8+10)
IndexGetLargeKeySpace-12                  347ns ± 1%    2053ns ± 2%   +491.18%  (p=0.000 n=10+10)
IndexRangeBase-12                         405ns ± 4%     443ns ± 3%     +9.58%  (p=0.000 n=10+10)
IndexRangeHighSelectivity-12              354ns ± 1%     340ns ± 2%     -4.02%  (p=0.000 n=10+10)
IndexRangeLongKey-12                      655ns ± 4%     725ns ± 7%    +10.62%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                3.94µs ± 7%    1.05µs ± 6%    -73.42%  (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12                872ns ± 1%    2657ns ± 2%   +204.91%  (p=0.000 n=10+10)
IndexRevisionsBase-12                     296ns ± 3%     331ns ± 1%    +11.48%  (p=0.000 n=10+9)
IndexRevisionsHighSelectivity-12          241ns ± 1%     221ns ± 1%     -8.57%  (p=0.000 n=10+9)
IndexRevisionsLongKey-12                  496ns ± 7%     571ns ± 4%    +15.02%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12            35.4µs ±18%     1.2µs ± 3%    -96.63%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12            702ns ± 1%    2457ns ± 2%   +249.89%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                    244µs ± 3%     240µs ± 3%       ~     (p=0.123 n=10+10)
IndexCountRevisionsBase-12                219ns ± 3%     235ns ± 2%     +7.01%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     168ns ± 5%     126ns ± 3%    -25.26%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             379ns ± 1%     442ns ± 2%    +16.75%  (p=0.000 n=8+10)
IndexCountRevisionsRepeatedKeys-12       3.97µs ±14%    0.70µs ± 6%    -82.47%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       601ns ± 1%    2290ns ± 3%   +280.76%  (p=0.000 n=10+10)

name                                   old alloc/op   new alloc/op   delta
IndexPutBase-12                            150B ± 0%     2148B ±18%  +1327.24%  (p=0.000 n=8+10)
IndexPutLongKey-12                         151B ± 1%     2076B ±16%  +1277.64%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                   121B ± 0%     5125B ± 1%  +4135.79%  (p=0.000 n=7+10)
IndexGetBase-12                           64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetAccurate-12                       64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetMisses-12                         64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetLongKey-12                        64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetLargeKeySpace-12                  64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexRangeBase-12                          558B ± 6%      493B ± 6%    -11.63%  (p=0.000 n=10+10)
IndexRangeHighSelectivity-12               623B ± 0%      560B ± 0%    -10.23%  (p=0.000 n=10+10)
IndexRangeLongKey-12                       558B ± 4%      480B ± 4%    -13.88%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                  552B ± 6%      468B ± 7%    -15.21%  (p=0.000 n=10+10)
IndexRangeLargeKeySpace-12                 556B ± 0%      494B ± 2%    -11.11%  (p=0.000 n=10+10)
IndexRevisionsBase-12                      265B ± 1%      196B ± 2%    -26.15%  (p=0.000 n=9+8)
IndexRevisionsHighSelectivity-12           288B ± 0%      224B ± 0%    -22.11%  (p=0.000 n=10+8)
IndexRevisionsLongKey-12                   263B ± 4%      198B ± 7%    -24.41%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12              252B ± 5%      191B ±11%    -24.37%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12             262B ± 1%      198B ± 1%    -24.57%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                     303B ± 0%      239B ± 0%    -21.12%  (p=0.002 n=8+10)
IndexCountRevisionsBase-12                64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12        64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)

name                                   old allocs/op  new allocs/op  delta
IndexPutBase-12                            1.00 ± 0%      7.00 ± 0%   +600.00%  (p=0.000 n=10+10)
IndexPutLongKey-12                         1.00 ± 0%      7.00 ± 0%   +600.00%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                   1.00 ± 0%     16.00 ± 0%  +1500.00%  (p=0.000 n=10+10)
IndexGetBase-12                            1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                    1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetAccurate-12                        1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetMisses-12                          1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetLongKey-12                         1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetLargeKeySpace-12                   1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexRangeBase-12                          7.00 ± 0%      5.70 ±12%    -18.57%  (p=0.000 n=9+10)
IndexRangeHighSelectivity-12               7.40 ± 8%      6.50 ± 8%    -12.16%  (p=0.005 n=10+10)
IndexRangeLongKey-12                       7.00 ± 0%      5.70 ±12%    -18.57%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                  7.40 ± 8%      6.30 ±11%    -14.86%  (p=0.001 n=10+10)
IndexRangeLargeKeySpace-12                 5.00 ± 0%      4.00 ± 0%    -20.00%  (p=0.000 n=10+10)
IndexRevisionsBase-12                      4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=9+10)
IndexRevisionsHighSelectivity-12           4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+10)
IndexRevisionsLongKey-12                   4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+9)
IndexRevisionsRepeatedKeys-12              4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12             3.00 ± 0%      2.00 ± 0%    -33.33%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                     4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.002 n=8+10)
IndexCountRevisionsBase-12                 1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12      1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12              1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12         1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12        1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)

@serathius
Copy link
Member Author

Looking at btree code, looks like adding fast counting should not be hard as it already includes intervals.

@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2024

Codecov Report

Attention: Patch coverage is 69.13580% with 25 lines in your changes missing coverage. Please review.

Project coverage is 57.03%. Comparing base (58086da) to head (8674a56).
Report is 2 commits behind head on main.

Current head 8674a56 differs from pull request most recent head 4d92c28

Please upload reports for the commit 4d92c28 to get more accurate results.

Files with missing lines Patch % Lines
server/storage/mvcc/index.go 69.13% 21 Missing and 4 partials ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
server/storage/mvcc/kvstore.go 57.09% <ø> (-32.78%) ⬇️
server/storage/mvcc/index.go 64.10% <69.13%> (-27.57%) ⬇️

... and 192 files with indirect coverage changes

@@             Coverage Diff             @@
##             main   #18184       +/-   ##
===========================================
- Coverage   68.84%   57.03%   -11.82%     
===========================================
  Files         421      420        -1     
  Lines       35857    35604      -253     
===========================================
- Hits        24686    20305     -4381     
- Misses       9751    13865     +4114     
- Partials     1420     1434       +14     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58086da...4d92c28. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k8s-ci-robot
Copy link

@serathius: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-unit-test-arm64 6838500 link true /test pull-etcd-unit-test-arm64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Marek Siarkowicz <[email protected]>
@serathius serathius force-pushed the index-immutable-tree branch from 8674a56 to 63c92d0 Compare April 20, 2025 08:11
@serathius serathius force-pushed the index-immutable-tree branch 2 times, most recently from 625b706 to db3a652 Compare April 20, 2025 09:13
Signed-off-by: Marek Siarkowicz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants