Skip to content

Commit e75c26e

Browse files
feat: Evented desired state of world populator
1 parent aacf050 commit e75c26e

File tree

2 files changed

+443
-0
lines changed

2 files changed

+443
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,399 @@
1+
# KEP-4979: Evented desired state of world populator
2+
3+
<!--
4+
This is the title of your KEP. Keep it short, simple, and descriptive. A good
5+
title can help communicate what the KEP is and should be considered as part of
6+
any review.
7+
-->
8+
9+
<!--
10+
A table of contents is helpful for quickly jumping to sections of a KEP and for
11+
highlighting any additional information provided beyond the standard KEP
12+
template.
13+
14+
Ensure the TOC is wrapped with
15+
<code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
16+
tags, and then generate with `hack/update-toc.sh`.
17+
-->
18+
19+
<!-- toc -->
20+
- [Release Signoff Checklist](#release-signoff-checklist)
21+
- [Summary](#summary)
22+
- [Motivation](#motivation)
23+
- [Goals](#goals)
24+
- [Non-Goals](#non-goals)
25+
- [Proposal](#proposal)
26+
- [Risks and Mitigations](#risks-and-mitigations)
27+
- [Design Details](#design-details)
28+
- [Unit tests](#unit-tests)
29+
- [Integration tests](#integration-tests)
30+
- [e2e tests](#e2e-tests)
31+
- [Graduation Criteria](#graduation-criteria)
32+
- [Alpha](#alpha)
33+
- [Beta](#beta)
34+
- [Beta (enabled by default)](#beta-enabled-by-default)
35+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
36+
- [Version Skew Strategy](#version-skew-strategy)
37+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
38+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
39+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
40+
- [Monitoring Requirements](#monitoring-requirements)
41+
- [Dependencies](#dependencies)
42+
- [Scalability](#scalability)
43+
- [Troubleshooting](#troubleshooting)
44+
- [Implementation History](#implementation-history)
45+
- [Drawbacks](#drawbacks)
46+
- [Alternatives](#alternatives)
47+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
48+
<!-- /toc -->
49+
50+
## Release Signoff Checklist
51+
52+
<!--
53+
**ACTION REQUIRED:** In order to merge code into a release, there must be an
54+
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
55+
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
56+
of the targeted release**.
57+
58+
For enhancements that make changes to code or processes/procedures in core
59+
Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
60+
Signoff checklist to be completed.
61+
62+
Check these off as they are completed for the Release Team to track. These
63+
checklist items _must_ be updated for the enhancement to be released.
64+
-->
65+
66+
Items marked with (R) are required *prior to targeting to a milestone / release*.
67+
68+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
69+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
70+
- [ ] (R) Design details are appropriately documented
71+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
72+
- [ ] e2e Tests for all Beta API Operations (endpoints)
73+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
74+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
75+
- [ ] (R) Graduation criteria is in place
76+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
77+
- [ ] (R) Production readiness review completed
78+
- [ ] (R) Production readiness review approved
79+
- [ ] "Implementation History" section is up-to-date for milestone
80+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
81+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
82+
83+
<!--
84+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
85+
-->
86+
87+
[kubernetes.io]: https://kubernetes.io/
88+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
89+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
90+
[kubernetes/website]: https://git.k8s.io/website
91+
92+
## Summary
93+
94+
This KEP proposes optimizing the loop iteration period (currently fixed at 100ms) in the Desired State of the World Populator (DSWP). The enhancement involves dynamically increasing the sleep period when no changes are detected and reacting to gRPC event streams from the CRI implementation to reduce unnecessary processing.
95+
96+
## Motivation
97+
98+
In the volume manager, the Desired State of the World executes a populator loop every 100ms, regardless of whether any changes have occurred. This fixed frequency may result in unnecessary CPU cycles during idle periods. By adopting an event-based approach, the kubelet can respond precisely when changes occur, improving performance and reducing system overhead.
99+
100+
### Goals
101+
102+
1. Dynamically adjust the populator loop interval based on system activity.
103+
2. Respond promptly to events, ensuring up-to-date DSWP cache.
104+
3. Maintain existing functionality as a fallback to ensure reliability.
105+
106+
### Non-Goals
107+
108+
1. Completely remove the batch loop period.
109+
2. Change the existing DSWP logic.
110+
111+
## Proposal
112+
113+
The Desired State of the World Populator will listen to gRPC event streams from the CRI implementation. Specifically, the CONTAINER_CREATED_EVENT and CONTAINER_DELETED_EVENT will trigger the populator loop.
114+
During periods of inactivity, the populator loop interval will increase by 100ms increments after the third execution, up to a maximum of 1 second. If an event is detected, the interval resets to the default 100ms. This approach ensures responsiveness while reducing CPU usage.
115+
116+
117+
### Risks and Mitigations
118+
119+
1. A bug or an issue on the event-based implementation: a flag will be needed to activate the feature ( alpha initially).
120+
2. CRI bug on the event system :
121+
1. Impact: Will take more time to mount/unmount volumes (1 second max instead of 100 ms ).
122+
2. Mitigation: when error is detected on CRI gRPC stream the event based mechanism is disabled and moved back to 100ms sleep period.
123+
124+
125+
## Design Details
126+
127+
Triggering the existing DSWP implementation based on the event type :
128+
129+
1. CONTAINER_CREATED_EVENT
130+
2. CONTAINER_DELETED_EVENT
131+
132+
Gradually increase after the third execution (to no impact the existing retry logic ) ( +100ms on each iteration) the sleep period to a 1 second maximum. If any event is detected, reset the interval back to the initial value (100ms).
133+
134+
##### Unit tests
135+
136+
- [X] Dynamic sleep period unit tests
137+
- [X] Increase the sleep period unit tests
138+
139+
##### Integration tests
140+
141+
1. [ ] Ensure the DSWP is iterating when:
142+
1. [ ] CRI event work properly: with different sleep period based on the event
143+
2. [ ] CRI event doesn't work: move back to the initial setup and iterate with 100ms.
144+
2. [ ] Verify the desired state of the world cache is updated correctly when the CRI events are received.
145+
3. [ ] Verify the desired state of the world cache is updated correctly when the CRI gRPC is triggering error.
146+
147+
##### e2e tests
148+
149+
- [ ] Generate a large number of CRI Events by creating and deleting a significant number of containers within a short period of time.
150+
151+
### Graduation Criteria
152+
#### Alpha
153+
154+
- Feature implemented behind a feature flag
155+
- Existing `node e2e` tests and integration tests around DSWP must pass
156+
157+
#### Beta
158+
- [ ] Add integration tests
159+
- [ ] Add E2E tests for DSWP
160+
161+
#### Beta (enabled by default)
162+
163+
### Upgrade / Downgrade Strategy
164+
165+
N/A
166+
167+
### Version Skew Strategy
168+
169+
N/A.
170+
171+
Since this feature alters only the way kubelet determines DSWP sleep period, this section is irrelevant to this feature.
172+
173+
## Production Readiness Review Questionnaire
174+
175+
<!--
176+
This section must be completed when targeting alpha to a release.
177+
-->
178+
### Feature Enablement and Rollback
179+
180+
###### How can this feature be enabled / disabled in a live cluster?
181+
182+
- [X] Feature gate (also fill in values in `kep.yaml`)
183+
- Feature gate name: EventedDesiredStateOfWorldPopulator
184+
- Components depending on the feature gate: kubelet
185+
- [X] CRI runtime must support gRPC event stream to work properly.
186+
187+
###### Does enabling the feature change any default behavior?
188+
189+
This feature does not introduce any user facing changes. Although users should notice increased performance of the kubelet.
190+
191+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
192+
193+
Yes, kubelet needs to be restarted to disable this feature.
194+
195+
###### What happens if we reenable the feature if it was previously rolled back?
196+
197+
If reenabled, kubelet will again start updating the DSWP sleep period based on CRI events. Everytime this feature is enabled or disabled, the kubelet will need to be restarted.
198+
199+
###### Are there any tests for feature enablement/disablement?
200+
201+
Current unit tests are checked without enabling/disabling FG, but for integration and e2e testing, FG (beta graduation) will need to be enabled.
202+
203+
### Rollout, Upgrade and Rollback Planning
204+
205+
<!--
206+
This section must be completed when targeting beta to a release.
207+
-->
208+
209+
###### How can a rollout or rollback fail? Can it impact already running workloads?
210+
211+
This feature relies on CRI runtime events to dynamically adjust the DSWP sleep period. If the CRI gRPC service encounters errors or becomes unresponsive, the sleep period automatically reverts to the default 100ms without requiring manual intervention by the cluster operator.
212+
Failures during rollout or rollback are unlikely to impact already running workloads, as the core functionality of the DSWP remains unchanged, and the system defaults to the original polling behavior.
213+
214+
###### What specific metrics should inform a rollback?
215+
216+
N/A.
217+
218+
Since the feature self-heals by reverting to the default 100ms sleep period upon CRI gRPC errors.
219+
220+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
221+
222+
Yes, I tested this feature locally using `./hack/local-up-cluster.sh`.
223+
224+
1. CRI gRPC event stream supported:
225+
* Enabling the feature flag and created a sample deployment.
226+
* Checking the DSWP sleep period was dynamically changing by observing the kubelet logs `Evented DSWP:`.
227+
* Disabling the feature flag and restarting the kubelet.
228+
* Confirming that the DSWP sleep period reverted to the default 100ms (I added locally some logs to the populator loop).
229+
2. CRI gRPC event stream is raising an error:
230+
* Enabling the feature flag and created a sample deployment.
231+
* Verifing that the DSWP sleep period did not change (I added locally some logs to the populator loop).
232+
* Disabling the feature flag and restarting the kubelet.
233+
* Confirming that the DSWP sleep period is still using the 100ms sleep period (I added locally some logs to the populator loop).
234+
235+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
236+
237+
No.
238+
239+
### Monitoring Requirements
240+
241+
<!--
242+
This section must be completed when targeting beta to a release.
243+
244+
For GA, this section is required: approvers should be able to confirm the
245+
previous answers based on experience in the field.
246+
-->
247+
248+
###### How can an operator determine if the feature is in use by workloads?
249+
250+
Whenever a pod is created or deleted the kubelet metric `evented_dswp_connection_success_count` is increased consistently.
251+
252+
###### How can someone using this feature know that it is working for their instance?
253+
254+
Observe the `evented_dswp_connection_success_count` metric.
255+
256+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
257+
258+
The DSWP runs immediately or at least <= 100ms after container is created or deleted.
259+
260+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
261+
262+
- [X] Metrics
263+
- Metric name: evented_dswp_connection_success_count
264+
- Components exposing the metric: kubelet
265+
266+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
267+
268+
- [ ] Metrics
269+
- Metric name: evented_dswp_process_event_processing_delay
270+
- Metric description: exposing the delay period between the event emitted by CRI and the exact time of DSWP has been executed.
271+
- Components exposing the metric: kubelet
272+
273+
### Dependencies
274+
275+
N/A.
276+
277+
###### Does this feature depend on any specific services running in the cluster?
278+
279+
<!--
280+
Think about both cluster-level services (e.g. metrics-server) as well
281+
as node-level agents (e.g. specific version of CRI). Focus on external or
282+
optional services that are needed. For example, if this feature depends on
283+
a cloud provider API, or upon an external software-defined storage or network
284+
control plane.
285+
286+
For each of these, fill in the following—thinking about running existing user workloads
287+
and creating new ones, as well as about cluster-level services (e.g. DNS):
288+
- [Dependency name]
289+
- Usage description:
290+
- Impact of its outage on the feature:
291+
- Impact of its degraded performance or high-error rates on the feature:
292+
-->
293+
- CRI Runtime
294+
- CRI events must be installed and running.
295+
- Impact of its outage on the feature: Kubelet will detect the outage and fall back to the default sleep period.
296+
- Impact of its degraded performance or high-error rates on the feature:
297+
- Fall back to the default sleep period.
298+
299+
### Scalability
300+
301+
<!--
302+
For alpha, this section is encouraged: reviewers should consider these questions
303+
and attempt to answer them.
304+
305+
For beta, this section is required: reviewers must answer these questions.
306+
307+
For GA, this section is required: approvers should be able to confirm the
308+
previous answers based on experience in the field.
309+
-->
310+
311+
###### Will enabling / using this feature result in any new API calls?
312+
313+
No.
314+
315+
###### Will enabling / using this feature result in introducing new API types?
316+
317+
No.
318+
319+
###### Will enabling / using this feature result in any new calls to the cloud provider?
320+
321+
No.
322+
323+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
324+
325+
No.
326+
327+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
328+
329+
No.
330+
331+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
332+
333+
No.
334+
335+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
336+
337+
No.
338+
339+
### Troubleshooting
340+
341+
<!--
342+
This section must be completed when targeting beta to a release.
343+
344+
For GA, this section is required: approvers should be able to confirm the
345+
previous answers based on experience in the field.
346+
347+
The Troubleshooting section currently serves the `Playbook` role. We may consider
348+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
349+
details). For now, we leave it here.
350+
-->
351+
352+
###### How does this feature react if the API server and/or etcd is unavailable?
353+
The feature does not depend on the API server / etcd.
354+
355+
###### What are other known failure modes?
356+
357+
- CRI gRPC event stream error :
358+
- If the maximum number of DSWP stream retries (maxDSWPStreamRetries = 10) is reached, the kubelet will fall back to the normal 100ms sleep period and stop watching the CRI gRPC event stream.
359+
- Detection: the number of retries can be monitored using `evented_dswp_connection_error_count` metric.
360+
- Mitigations: Ensure the health of the CRI gRPC event stream.
361+
- Diagnostics: Logs related to this feature are prefixed with "Evented DSWP:" and can be viewed at log level 6.
362+
- Testing: Integration tests have been planned to validate this behavior.
363+
364+
###### What steps should be taken if SLOs are not being met to determine the problem?
365+
366+
## Implementation History
367+
368+
<!--
369+
Major milestones in the lifecycle of a KEP should be tracked in this section.
370+
Major milestones might include:
371+
- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance
372+
- the `Proposal` section being merged, signaling agreement on a proposed design
373+
- the date implementation started
374+
- the first Kubernetes release where an initial version of the KEP was available
375+
- the version of Kubernetes where the KEP graduated to general availability
376+
- when the KEP was retired or superseded
377+
-->
378+
379+
## Drawbacks
380+
381+
<!--
382+
Why should this KEP _not_ be implemented?
383+
-->
384+
385+
## Alternatives
386+
387+
1. Proposal 1 : https://github.com/kubernetes/kubernetes/pull/126450 : the PR allows users to customize or override the loop period configuration using the kubelet conf file :
388+
389+
Reason/suggestion ( sig node ) : move to event-based approach: https://github.com/kubernetes/kubernetes/issues/126049#issuecomment-2278659439
390+
391+
2. Proposal 2: https://github.com/kubernetes/kubernetes/pull/126668 : This proposal increases the timer without the event-based approach. If a change is detected, the function resets the sleep period. However, this PR will likely be closed since changes are detected late.
392+
393+
## Infrastructure Needed (Optional)
394+
395+
<!--
396+
Use this section if you need things from the project/SIG. Examples include a
397+
new subproject, repos requested, or GitHub details. Listing these here allows a
398+
SIG to get the process for these resources started right away.
399+
-->

0 commit comments

Comments
 (0)