-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshottable API server cache #5017
base: master
Are you sure you want to change the base?
Conversation
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: serathius The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
I run the scalability tests to measure overhead of clone. Scalability tests are a good as they don't use pagination nor exact request. I used kubernetes/kubernetes#126855 which clones the storage on each request. The results are good: Overhead based on profiles collected during scalability tests:
The overhead is small enough that is within normal variance of memory usage during the test. The are some noticeable increases in request latency however I they are still far from SLO and could be due to high variance in results.
If we account for high variance of latency in scalability tests and look at profile differences only, we can estimate the expected overhead of keeping all store snapshots in the watchcache to be below 2% of memory. |
Are you looking at LoadResponsiveness_Prometheus or LoadResponsiveness_PrometheusSimple for latencies? https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE |
I looked at the |
I would focus on PrometheusSimple as something that is much more predictible/repeatable. |
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md
Outdated
Show resolved
Hide resolved
|
||
No | ||
|
||
### Troubleshooting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we check in the field whether the response from the cache exactly matches the response from etcd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we enable just pagination requests, they could be checked by making an exact request. However the question is why should you care about this at all? Do you want to check if cache was corrupted? For that we should have an automated mechanism.
9e39954
to
1538972
Compare
What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database. |
Create first draft of #4988 as provisional.
Draft PR for context kubernetes/kubernetes#128951
/cc @wojtek-t @deads2k @MadhavJivrajani @jpbetz