Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshottable API server cache #5017

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

serathius
Copy link
Contributor

@serathius serathius commented Dec 30, 2024

Create first draft of #4988 as provisional.

Draft PR for context kubernetes/kubernetes#128951

/cc @wojtek-t @deads2k @MadhavJivrajani @jpbetz

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 30, 2024
@dims
Copy link
Member

dims commented Dec 30, 2024

cc @mengqiy @chaochn47 @shyamjvs

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: serathius
Once this PR has been reviewed and has the lgtm label, please assign apelisse for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wojtek-t wojtek-t self-assigned this Jan 10, 2025
@serathius
Copy link
Contributor Author

I run the scalability tests to measure overhead of clone. Scalability tests are a good as they don't use pagination nor exact request. I used kubernetes/kubernetes#126855 which clones the storage on each request. The results are good:

Overhead based on profiles collected during scalability tests:

  • Additional 7GB of object allocations, which accounts for 0.2% of allocations.
    image
  • Additional 300MB of memory used, which accounts for 1.3% of memory used in scalability test.
    image

The overhead is small enough that is within normal variance of memory usage during the test. The are some noticeable increases in request latency however I they are still far from SLO and could be due to high variance in results.

If we account for high variance of latency in scalability tests and look at profile differences only, we can estimate the expected overhead of keeping all store snapshots in the watchcache to be below 2% of memory.

@wojtek-t
Copy link
Member

Are you looking at LoadResponsiveness_Prometheus or LoadResponsiveness_PrometheusSimple for latencies?
If you got 170ms for delete pods in base, it's probably the former, but it also has much higher variance.
What are is the comparison for the later?

https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE
https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=pods&Scope=resource&Subresource=&Verb=DELETE

@serathius
Copy link
Contributor Author

I looked at the LoadResponsiveness_Prometheus. For PrometheusSimple the latencies match aside of some anomalies like GET services, however they also seem very variadic in PrometheusSimple. https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=services&Scope=resource&Subresource=&Verb=GET

@wojtek-t
Copy link
Member

I would focus on PrometheusSimple as something that is much more predictible/repeatable.
If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).


No

### Troubleshooting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we check in the field whether the response from the cache exactly matches the response from etcd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we enable just pagination requests, they could be checked by making an exact request. However the question is why should you care about this at all? Do you want to check if cache was corrupted? For that we should have an automated mechanism.

@serathius serathius force-pushed the kep-4988 branch 4 times, most recently from 9e39954 to 1538972 Compare January 16, 2025 15:11
@serathius
Copy link
Contributor Author

serathius commented Jan 16, 2025

@wojtek-t

If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).

What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database.

@serathius serathius changed the title Pagination from cache KEP Snapshottable API server cache Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants