Snapshottable API server cache #5017

serathius · 2024-12-30T16:06:32Z

Create first draft of #4988 as provisional.

Draft PR for context kubernetes/kubernetes#128951

/cc @wojtek-t @deads2k @MadhavJivrajani @jpbetz

dims · 2024-12-30T16:18:01Z

cc @mengqiy @chaochn47 @shyamjvs

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

k8s-ci-robot · 2025-01-08T11:04:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: serathius
Once this PR has been reviewed and has the lgtm label, please assign apelisse for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-api-machinery/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

serathius · 2025-01-14T13:03:52Z

I run the scalability tests to measure overhead of clone. Scalability tests are a good as they don't use pagination nor exact request. I used kubernetes/kubernetes#126855 which clones the storage on each request. The results are good:

Overhead based on profiles collected during scalability tests:

Additional 7GB of object allocations, which accounts for 0.2% of allocations.
Additional 300MB of memory used, which accounts for 1.3% of memory used in scalability test.

The overhead is small enough that is within normal variance of memory usage during the test. The are some noticeable increases in request latency however I they are still far from SLO and could be due to high variance in results.

LIST pods with namespace 99%ile increased from 1s to 1.2s (within variance https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=namespace&Subresource=&Verb=LIST)
DELETE pods 99%ile increased from 170ms to 300ms https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE
and some other single object operation have seen latency increase.

If we account for high variance of latency in scalability tests and look at profile differences only, we can estimate the expected overhead of keeping all store snapshots in the watchcache to be below 2% of memory.

wojtek-t · 2025-01-14T13:15:20Z

Are you looking at LoadResponsiveness_Prometheus or LoadResponsiveness_PrometheusSimple for latencies?
If you got 170ms for delete pods in base, it's probably the former, but it also has much higher variance.
What are is the comparison for the later?

https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE
https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=pods&Scope=resource&Subresource=&Verb=DELETE

serathius · 2025-01-14T13:21:32Z

I looked at the LoadResponsiveness_Prometheus. For PrometheusSimple the latencies match aside of some anomalies like GET services, however they also seem very variadic in PrometheusSimple. https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=services&Scope=resource&Subresource=&Verb=GET

wojtek-t · 2025-01-14T13:56:06Z

I would focus on PrometheusSimple as something that is much more predictible/repeatable.
If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

deads2k · 2025-01-14T22:48:05Z

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

+
+No
+
+### Troubleshooting


How can we check in the field whether the response from the cache exactly matches the response from etcd?

If we enable just pagination requests, they could be checked by making an exact request. However the question is why should you care about this at all? Do you want to check if cache was corrupted? For that we should have an automated mechanism.

serathius · 2025-01-16T15:39:16Z

@wojtek-t

If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).

What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database.

k8s-ci-robot requested review from deads2k, jpbetz, MadhavJivrajani and wojtek-t December 30, 2024 16:06

MadhavJivrajani reviewed Jan 2, 2025

View reviewed changes

wojtek-t reviewed Jan 10, 2025

View reviewed changes

wojtek-t self-assigned this Jan 10, 2025

jpbetz reviewed Jan 14, 2025

View reviewed changes

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md Outdated Show resolved Hide resolved

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md Outdated Show resolved Hide resolved