Skip to content

Commit 08815d0

Browse files
authored
Merge pull request #8803 from lauralorenz/add-kubelet-pod-lifecycle-docs
Add kubelet pod lifecycle docs
2 parents 9925909 + f66bd98 commit 08815d0

File tree

2 files changed

+180
-0
lines changed

2 files changed

+180
-0
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# Kubelet Component
2+
3+
This document provides a high-level walkthrough of the Kubelet's code structure,
4+
specifically focusing on the Pod lifecycle and the "Sync Loop". It tracks the
5+
flow of a Pod from the moment it is assigned to a node to its execution and
6+
eventual termination.
7+
8+
## Overview & Scope
9+
10+
The Kubelet deals primarily in **Pods**. While it handles some supporting
11+
resources (Volumes, etc.), its main unit of work is the Pod. It does not
12+
natively understand Deployments, StatefulSets, or DaemonSets, or any other
13+
workload abstractions -- those are handled by upstream controllers.
14+
15+
1. **Creation**: Workload controllers create Pods.
16+
2. **Scheduling**: The Scheduler picks a node and **binds** the Pod to it
17+
(setting `spec.nodeName`).
18+
3. **Kubelet Observation**: Only after a Pod is bound to a node does the
19+
Kubelet on that node see it and take ownership. _**<==== WE ARE HERE!!!**_
20+
21+
The Kubelet **orchestrates** the Pod spec into running processes on the host via
22+
the **Container Runtime Interface (CRI)**. The runtime (e.g., `containerd`)
23+
translates these requests for an OCI runtime (e.g. `runc`), which handles the
24+
low-level operating system setup for the container.
25+
26+
## Startup Flow: From API to Pod Worker
27+
28+
When a new Pod is assigned to a node, it follows this path through the Kubelet
29+
internals:
30+
31+
1. **Config Source**: An
32+
[Informer](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/config/apiserver.go#L38-L39)
33+
watches for Pods bound to this node.
34+
2. **PodConfig**: The
35+
[PodConfig](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/config/config.go#L75)
36+
aggregates updates from multiple sources (API server, file static pods, HTTP
37+
static pods) and merges them into a single stream of updates.
38+
3. **SyncLoopIteration**: The main control loop
39+
[syncLoopIteration](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2574)
40+
receives these updates.
41+
4. **HandlePodAdditions**: The
42+
[HandlePodAdditions](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2713)
43+
function handles the initial setup:
44+
* **Allocation**: Calls
45+
[allocationManager.AddPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/allocation/allocation_manager.go#L527)
46+
to check internal admission checks (e.g., topology resources).
47+
5. **Pod Workers**: The work is queued to a dedicated [Pod
48+
Worker](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/pod_workers.go#L755).
49+
Each Pod has its own worker goroutine to ensure operations for a single Pod
50+
are serialized.
51+
52+
## SyncPod: The Core Logic
53+
54+
The
55+
[SyncPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L1941)
56+
function is the primary workhorse. It orchestrates the following steps:
57+
58+
1. **Status Generation**: [Generates the
59+
status](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L1996)
60+
representing the state *before* work begins.
61+
2. **Cgroups**: Calls
62+
[pcm.EnsureExists](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/cm/pod_container_manager_linux.go#L74)
63+
to configure Pod-level cgroups.
64+
* *Note*: While the container runtime manages container cgroups, the
65+
Kubelet manages the parent Pod cgroups (a holdover from the Docker shim
66+
era).
67+
3. **Volumes**: Calls
68+
[WaitForAttachAndMount](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2120)
69+
to ensure volumes are ready.
70+
4. **Runtime Sync**: Hand off to `kuberuntime_manager.SyncPod` for container
71+
operations.
72+
73+
### Runtime Manager Sync
74+
75+
The [kuberuntime
76+
SyncPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1394)
77+
function bridges the declarative Kubernetes API to the imperative CRI API.
78+
79+
1. **Compute Actions**:
80+
[computePodActions](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1397)
81+
compares the desired spec with the current runtime state. It outputs a
82+
[podActions](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L563)
83+
struct listing what to kill, create, or update.
84+
2. **Actuation**:
85+
* **Sandboxes**: Calls
86+
[createPodSandbox](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1543)
87+
if needed.
88+
* **Containers**: For each container to start, calls
89+
[startContainer](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_container.go#L200):
90+
1. Checks CrashLoopBackOff.
91+
2. [Ensures Image
92+
Exists](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_container.go#L215)
93+
(pulling if necessary).
94+
3. [Generates Container
95+
Config](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_container.go#L343)
96+
(this is when it translates K8s API to [CRI
97+
spec](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto)).
98+
4. Calls CRI `CreateContainer` and `StartContainer`.
99+
5. Executes the **PostStart hook** synchronously (blocking other
100+
containers in the Pod!).
101+
102+
## Steady State: PLEG and Probes
103+
104+
Once running, the Kubelet maintains the Pod via:
105+
106+
* **PLEG (Pod Lifecycle Event Generator)**: [Polls the
107+
runtime](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/pleg/generic.go#L128)
108+
(default every ~2s) to detect state changes. If a [change is
109+
detected](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/pleg/generic.go#L397),
110+
it generates an event to wake up the Sync Loop.
111+
* **Probes**: The
112+
[ProbeManager](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/prober/prober_manager.go#L185)
113+
runs workers for Liveness, Readiness, and Startup probes. Results trigger
114+
updates via
115+
[resultsManager](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/prober/worker.go#L365).
116+
117+
## Termination
118+
119+
Termination is handled by
120+
[SyncTerminatingPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2182):
121+
122+
1. Stops probes.
123+
2. Calls the kuberuntime's
124+
[KillPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1860)
125+
which itself calls the container runtime using CRI to stop containers and
126+
the sandbox.
127+
3. Generates final status.
128+
129+
After termination,
130+
[SyncTerminatedPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2339)
131+
performs final cleanup (unmounting volumes, releasing resources).
132+
133+
## Garbage Collection
134+
135+
* **Container GC**: Periodic loop in the runtime manager to remove exited
136+
containers.
137+
* **Image GC**: Periodic check to remove unused images based on disk pressure.
138+
* **"Housekeeping"**:
139+
[HandlePodCleanups](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet_pods.go#L1192)
140+
cleans up orphaned pod directories and internal state.
141+
142+
## Advanced Concepts
143+
144+
### Static & Mirror Pods
145+
146+
* **Static Pod**: A Pod sourced from a local file or HTTP endpoint, not the
147+
API server. The API server initially knows nothing about it.
148+
* **Mirror Pod**: A read-only Pod object created by the Kubelet in the API
149+
server to represent a Static Pod. This allows the Scheduler to see the
150+
resource usage and users to see the status. Identified by the
151+
`kubernetes.io/config.mirror` annotation.
152+
153+
### In-Place Resize
154+
155+
Resize involves reconciling four resource states:
156+
1. **Desired**: From Pod Spec.
157+
2. **Allocated**: Admitted by Kubelet (persisted in checkpoints).
158+
3. **Actuated**: Successfully applied to the runtime.
159+
4. **Actual**: Read back from cgroups.
160+
161+
The
162+
[AllocationManager](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/allocation/allocation_manager.go#L527)
163+
mediates these transitions.
164+
165+
## Testing
166+
167+
Kubelet testing is split into:
168+
* **Unit Tests**: Heavily used for logic verification.
169+
* **Node E2E**:
170+
[test/e2e_node](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node).
171+
Runs a single-node cluster (Kubelet + API Server typically) to test Kubelet
172+
in isolation on various OS/Runtime combinations.
173+
* **Cluster E2E**:
174+
[test/e2e/node](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/node).
175+
Tests that require a full control plane.
176+
* **Common**:
177+
[test/e2e/common](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/common).
178+
Tests compliant with both environments.

sig-node/CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,8 @@ For general code organization, read [contributors/devel/README.md](../contributo
158158
* kubelet
159159
* <https://github.com/kubernetes/kubernetes/tree/master/cmd/kubelet>
160160
* <https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet>
161+
* See a detailed walkthrough of the kubelet component in [the kubelet
162+
README](../contributors/devel/sig-node/kubelet.md).
161163
* Probe: <https://github.com/kubernetes/kubernetes/tree/master/pkg/probe>
162164
* NodeLifecycle: <https://github.com/kubernetes/kubernetes/tree/master/pkg/controller/nodelifecycle>
163165
* Node API: <https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/api/node>

0 commit comments

Comments
 (0)