[evented pleg]: using real-time container events for pod state determination #129355

HirazawaUi · 2024-12-21T09:04:04Z

What type of PR is this?

/kind bug

Which issue(s) this PR fixes:

Special notes for your reviewer:

The purpose of this PR is to resolve #124704

I have carefully introduced the causes of this problem and various solutions we've attempted to resolve it in https://docs.google.com/document/d/1TPrY56q9MNW8r1FuzKDFkBBhOjQ0hqi7wJAbIP1O-4g/

This PR proposes that during its execution cycle, podWorkerLoop should directly use the latest events reported by the container runtime as the current Pod state. If we can confirm that the events reported by the container runtime are reliable and up-to-date, we can stop relying on timestamps to determine whether the state is current. In both Generic PLEG and the current Evented PLEG, if the container state does not change, the timestamps in the cache are still updated unnecessarily (every 5 seconds and 1 second, respectively). This shows that timestamp-based checks are meaningless when the container state remains unchanged

The current container lifecycle code is not yet ready to directly accept the states reported by the container runtime. To address this, the evented PLEG needs to filter out sandbox creation and deletion events, as the container runtime reports the same events for both sandbox containers and regular containers, and kubelet has never handled these issues. Consequently, an additional condition has been added to ensure that sandbox container creation events do not trigger lifecycle processing before regular container creation, and sandbox container deletion events do not trigger lifecycle processing after regular container deletion. However, the cache will still be updated to reflect the latest state.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3386-kubelet-evented-pleg/README.md

k8s-ci-robot · 2024-12-21T09:04:14Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

HirazawaUi · 2024-12-21T09:04:44Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

HirazawaUi · 2024-12-21T13:13:56Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

HirazawaUi · 2024-12-21T14:21:20Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e

HirazawaUi · 2024-12-21T15:27:37Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

HirazawaUi · 2024-12-22T10:20:09Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

HirazawaUi · 2024-12-26T14:21:35Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2

HirazawaUi · 2024-12-26T15:12:53Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e

HirazawaUi · 2025-02-26T14:28:23Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-e2e-kind-evented-pleg

harche · 2025-02-26T14:41:09Z

@HirazawaUi #129355 (review)

that comment would be applicable everywhere except in evented.go file. We are taking a very cautious approach to make sure the changes remain isolated from existing generic pleg.

HirazawaUi · 2025-02-26T14:55:04Z

@HirazawaUi #129355 (review)

that comment would be applicable everywhere except in evented.go file. We are taking a very cautious approach to make sure the changes remain isolated from existing generic pleg.

Thanks for the reminder!

HirazawaUi · 2025-02-26T14:55:10Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-e2e-kind-evented-pleg

HirazawaUi · 2025-03-10T14:03:37Z

@harche @haircommander Will we be able to move forward and merge this PR within the v1.33 cycle? IMO, if we can merge it earlier, it will allow users and developers to validate the reliability of this feature sooner, and we can also advance to the beta stage more quickly :)

harche · 2025-03-10T19:52:11Z

@harche @haircommander Will we be able to move forward and merge this PR within the v1.33 cycle? IMO, if we can merge it earlier, it will allow users and developers to validate the reliability of this feature sooner, and we can also advance to the beta stage more quickly :)

@HirazawaUi I am trying to test these changes in CRIO CI, cri-o/cri-o#9053, but looks like I might be goofing up in setting it up correctly. Looking into it.

HirazawaUi · 2025-03-10T23:59:29Z

@HirazawaUi I am trying to test these changes in CRIO CI, cri-o/cri-o#9053, but looks like I might be goofing up in setting it up correctly. Looking into it.

Thanks!

harche · 2025-03-11T18:58:49Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-e2e-kind-evented-pleg

harche · 2025-03-11T19:43:49Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-e2e-kind-evented-pleg

harche · 2025-03-11T20:05:06Z

@HirazawaUi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e 16b9490 link false /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
pull-kubernetes-e2e-kind-evented-pleg 16b9490 link false /test pull-kubernetes-e2e-kind-evented-pleg
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

/hold

HirazawaUi · 2025-03-11T23:49:26Z

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-e2e-kind-evented-pleg

HirazawaUi · 2025-03-11T23:58:01Z

It seems that #130599 made some adjustments to features related to pleg, resulting in the removal of import records. Consequently, after merging the main branch code, this branch also lost those import records, causing compilation failures.

… modify the determination logic

…master branch.

HirazawaUi · 2025-03-12T14:35:32Z

The latest submitted code removes unrelated code formatting changes (such as whitespace deletions) and modifies the duration for determining whether a container has started into a constant.

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
/test pull-kubernetes-e2e-kind-evented-pleg

HirazawaUi · 2025-03-12T15:30:03Z

/retest

HirazawaUi · 2025-03-18T15:05:44Z

/hold cancel
Since the #129355 (comment) have been addressed.

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 21, 2024

k8s-ci-robot requested review from haircommander and mrunalp December 21, 2024 09:04

k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 21, 2024

HirazawaUi force-pushed the test-evented-pleg branch from 76d5c41 to 4867c3d Compare December 21, 2024 12:44

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 21, 2024

HirazawaUi force-pushed the test-evented-pleg branch from 4867c3d to 9731965 Compare December 21, 2024 15:16

HirazawaUi force-pushed the test-evented-pleg branch from 9731965 to 655bb8c Compare December 22, 2024 10:19

HirazawaUi force-pushed the test-evented-pleg branch from 655bb8c to d01c1e6 Compare December 26, 2024 14:16

HirazawaUi force-pushed the test-evented-pleg branch from d01c1e6 to 2922d0d Compare December 27, 2024 01:41

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 27, 2024

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 26, 2025

HirazawaUi mentioned this pull request Mar 9, 2025

Pod Lifecycle: propagate context to containerRuntime SyncPod #127122

Open

HirazawaUi force-pushed the test-evented-pleg branch from 16b9490 to 67e6c17 Compare March 11, 2025 23:49

HirazawaUi and others added 7 commits March 12, 2025 22:25

using real-time container events for pod state determination

4ffe2a9

restart with a time check for evented pleg: default 1m

9a2348b

restart sidecar container like static pod created state container

893c39e

determine earlier whether the container is in the startup process and…

20a54a7

… modify the determination logic

fix evented pleg to adapt to containerd2.0

bb08ed7

wrap evented pleg code with feature gate

4812356

Fix the issue where import records are deleted due to changes in the …

0bb05dd

…master branch.

HirazawaUi force-pushed the test-evented-pleg branch from 67e6c17 to 0bb05dd Compare March 12, 2025 14:30

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 12, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[evented pleg]: using real-time container events for pod state determination #129355

[evented pleg]: using real-time container events for pod state determination #129355

HirazawaUi commented Dec 21, 2024 •

edited

Loading

k8s-ci-robot commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 22, 2024

HirazawaUi commented Dec 26, 2024

HirazawaUi commented Dec 26, 2024

HirazawaUi commented Feb 26, 2025

harche commented Feb 26, 2025

HirazawaUi commented Feb 26, 2025

HirazawaUi commented Feb 26, 2025

HirazawaUi commented Mar 10, 2025

harche commented Mar 10, 2025

HirazawaUi commented Mar 10, 2025

harche commented Mar 11, 2025

harche commented Mar 11, 2025

harche commented Mar 11, 2025

HirazawaUi commented Mar 11, 2025

HirazawaUi commented Mar 11, 2025

HirazawaUi commented Mar 12, 2025

HirazawaUi commented Mar 12, 2025

HirazawaUi commented Mar 18, 2025

[evented pleg]: using real-time container events for pod state determination #129355

Are you sure you want to change the base?

[evented pleg]: using real-time container events for pod state determination #129355

Conversation

HirazawaUi commented Dec 21, 2024 • edited Loading

What type of PR is this?

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 21, 2024

HirazawaUi commented Dec 22, 2024

HirazawaUi commented Dec 26, 2024

HirazawaUi commented Dec 26, 2024

HirazawaUi commented Feb 26, 2025

harche commented Feb 26, 2025

HirazawaUi commented Feb 26, 2025

HirazawaUi commented Feb 26, 2025

HirazawaUi commented Mar 10, 2025

harche commented Mar 10, 2025

HirazawaUi commented Mar 10, 2025

harche commented Mar 11, 2025

harche commented Mar 11, 2025

harche commented Mar 11, 2025

HirazawaUi commented Mar 11, 2025

HirazawaUi commented Mar 11, 2025

HirazawaUi commented Mar 12, 2025

HirazawaUi commented Mar 12, 2025

HirazawaUi commented Mar 18, 2025

HirazawaUi commented Dec 21, 2024 •

edited

Loading