|
| 1 | +# Kubelet Component |
| 2 | + |
| 3 | +This document provides a high-level walkthrough of the Kubelet's code structure, |
| 4 | +specifically focusing on the Pod lifecycle and the "Sync Loop". It tracks the |
| 5 | +flow of a Pod from the moment it is assigned to a node to its execution and |
| 6 | +eventual termination. |
| 7 | + |
| 8 | +## Overview & Scope |
| 9 | + |
| 10 | +The Kubelet deals primarily in **Pods**. While it handles some supporting |
| 11 | +resources (Volumes, etc.), its main unit of work is the Pod. It does not |
| 12 | +natively understand Deployments, StatefulSets, or DaemonSets, or any other |
| 13 | +workload abstractions -- those are handled by upstream controllers. |
| 14 | + |
| 15 | +1. **Creation**: Workload controllers create Pods. |
| 16 | +2. **Scheduling**: The Scheduler picks a node and **binds** the Pod to it |
| 17 | + (setting `spec.nodeName`). |
| 18 | +3. **Kubelet Observation**: Only after a Pod is bound to a node does the |
| 19 | + Kubelet on that node see it and take ownership. _**<==== WE ARE HERE!!!**_ |
| 20 | + |
| 21 | +The Kubelet **orchestrates** the Pod spec into running processes on the host via |
| 22 | +the **Container Runtime Interface (CRI)**. The runtime (e.g., `containerd`) |
| 23 | +translates these requests for an OCI runtime (e.g. `runc`), which handles the |
| 24 | +low-level operating system setup for the container. |
| 25 | + |
| 26 | +## Startup Flow: From API to Pod Worker |
| 27 | + |
| 28 | +When a new Pod is assigned to a node, it follows this path through the Kubelet |
| 29 | +internals: |
| 30 | + |
| 31 | +1. **Config Source**: An |
| 32 | + [Informer](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/config/apiserver.go#L38-L39) |
| 33 | + watches for Pods bound to this node. |
| 34 | +2. **PodConfig**: The |
| 35 | + [PodConfig](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/config/config.go#L75) |
| 36 | + aggregates updates from multiple sources (API server, file static pods, HTTP |
| 37 | + static pods) and merges them into a single stream of updates. |
| 38 | +3. **SyncLoopIteration**: The main control loop |
| 39 | + [syncLoopIteration](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2574) |
| 40 | + receives these updates. |
| 41 | +4. **HandlePodAdditions**: The |
| 42 | + [HandlePodAdditions](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2713) |
| 43 | + function handles the initial setup: |
| 44 | + * **Allocation**: Calls |
| 45 | + [allocationManager.AddPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/allocation/allocation_manager.go#L527) |
| 46 | + to check internal admission checks (e.g., topology resources). |
| 47 | +5. **Pod Workers**: The work is queued to a dedicated [Pod |
| 48 | + Worker](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/pod_workers.go#L755). |
| 49 | + Each Pod has its own worker goroutine to ensure operations for a single Pod |
| 50 | + are serialized. |
| 51 | + |
| 52 | +## SyncPod: The Core Logic |
| 53 | + |
| 54 | +The |
| 55 | +[SyncPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L1941) |
| 56 | +function is the primary workhorse. It orchestrates the following steps: |
| 57 | + |
| 58 | +1. **Status Generation**: [Generates the |
| 59 | + status](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L1996) |
| 60 | + representing the state *before* work begins. |
| 61 | +2. **Cgroups**: Calls |
| 62 | + [pcm.EnsureExists](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/cm/pod_container_manager_linux.go#L74) |
| 63 | + to configure Pod-level cgroups. |
| 64 | + * *Note*: While the container runtime manages container cgroups, the |
| 65 | + Kubelet manages the parent Pod cgroups (a holdover from the Docker shim |
| 66 | + era). |
| 67 | +3. **Volumes**: Calls |
| 68 | + [WaitForAttachAndMount](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2120) |
| 69 | + to ensure volumes are ready. |
| 70 | +4. **Runtime Sync**: Hand off to `kuberuntime_manager.SyncPod` for container |
| 71 | + operations. |
| 72 | + |
| 73 | +### Runtime Manager Sync |
| 74 | + |
| 75 | +The [kuberuntime |
| 76 | +SyncPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1394) |
| 77 | +function bridges the declarative Kubernetes API to the imperative CRI API. |
| 78 | + |
| 79 | +1. **Compute Actions**: |
| 80 | + [computePodActions](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1397) |
| 81 | + compares the desired spec with the current runtime state. It outputs a |
| 82 | + [podActions](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L563) |
| 83 | + struct listing what to kill, create, or update. |
| 84 | +2. **Actuation**: |
| 85 | + * **Sandboxes**: Calls |
| 86 | + [createPodSandbox](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1543) |
| 87 | + if needed. |
| 88 | + * **Containers**: For each container to start, calls |
| 89 | + [startContainer](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_container.go#L200): |
| 90 | + 1. Checks CrashLoopBackOff. |
| 91 | + 2. [Ensures Image |
| 92 | + Exists](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_container.go#L215) |
| 93 | + (pulling if necessary). |
| 94 | + 3. [Generates Container |
| 95 | + Config](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_container.go#L343) |
| 96 | + (this is when it translates K8s API to [CRI |
| 97 | + spec](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto)). |
| 98 | + 4. Calls CRI `CreateContainer` and `StartContainer`. |
| 99 | + 5. Executes the **PostStart hook** synchronously (blocking other |
| 100 | + containers in the Pod!). |
| 101 | + |
| 102 | +## Steady State: PLEG and Probes |
| 103 | + |
| 104 | +Once running, the Kubelet maintains the Pod via: |
| 105 | + |
| 106 | +* **PLEG (Pod Lifecycle Event Generator)**: [Polls the |
| 107 | + runtime](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/pleg/generic.go#L128) |
| 108 | + (default every ~2s) to detect state changes. If a [change is |
| 109 | + detected](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/pleg/generic.go#L397), |
| 110 | + it generates an event to wake up the Sync Loop. |
| 111 | +* **Probes**: The |
| 112 | + [ProbeManager](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/prober/prober_manager.go#L185) |
| 113 | + runs workers for Liveness, Readiness, and Startup probes. Results trigger |
| 114 | + updates via |
| 115 | + [resultsManager](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/prober/worker.go#L365). |
| 116 | + |
| 117 | +## Termination |
| 118 | + |
| 119 | +Termination is handled by |
| 120 | +[SyncTerminatingPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2182): |
| 121 | + |
| 122 | +1. Stops probes. |
| 123 | +2. Calls the kuberuntime's |
| 124 | + [KillPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1860) |
| 125 | + which itself calls the container runtime using CRI to stop containers and |
| 126 | + the sandbox. |
| 127 | +3. Generates final status. |
| 128 | + |
| 129 | +After termination, |
| 130 | +[SyncTerminatedPod](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet.go#L2339) |
| 131 | +performs final cleanup (unmounting volumes, releasing resources). |
| 132 | + |
| 133 | +## Garbage Collection |
| 134 | + |
| 135 | +* **Container GC**: Periodic loop in the runtime manager to remove exited |
| 136 | + containers. |
| 137 | +* **Image GC**: Periodic check to remove unused images based on disk pressure. |
| 138 | +* **"Housekeeping"**: |
| 139 | + [HandlePodCleanups](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/kubelet_pods.go#L1192) |
| 140 | + cleans up orphaned pod directories and internal state. |
| 141 | + |
| 142 | +## Advanced Concepts |
| 143 | + |
| 144 | +### Static & Mirror Pods |
| 145 | + |
| 146 | +* **Static Pod**: A Pod sourced from a local file or HTTP endpoint, not the |
| 147 | + API server. The API server initially knows nothing about it. |
| 148 | +* **Mirror Pod**: A read-only Pod object created by the Kubelet in the API |
| 149 | + server to represent a Static Pod. This allows the Scheduler to see the |
| 150 | + resource usage and users to see the status. Identified by the |
| 151 | + `kubernetes.io/config.mirror` annotation. |
| 152 | + |
| 153 | +### In-Place Resize |
| 154 | + |
| 155 | +Resize involves reconciling four resource states: |
| 156 | +1. **Desired**: From Pod Spec. |
| 157 | +2. **Allocated**: Admitted by Kubelet (persisted in checkpoints). |
| 158 | +3. **Actuated**: Successfully applied to the runtime. |
| 159 | +4. **Actual**: Read back from cgroups. |
| 160 | + |
| 161 | +The |
| 162 | +[AllocationManager](https://github.com/kubernetes/kubernetes/blob/03e14cc9432975dec161de1e52d7010f9711a913/pkg/kubelet/allocation/allocation_manager.go#L527) |
| 163 | +mediates these transitions. |
| 164 | + |
| 165 | +## Testing |
| 166 | + |
| 167 | +Kubelet testing is split into: |
| 168 | +* **Unit Tests**: Heavily used for logic verification. |
| 169 | +* **Node E2E**: |
| 170 | + [test/e2e_node](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node). |
| 171 | + Runs a single-node cluster (Kubelet + API Server typically) to test Kubelet |
| 172 | + in isolation on various OS/Runtime combinations. |
| 173 | +* **Cluster E2E**: |
| 174 | + [test/e2e/node](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/node). |
| 175 | + Tests that require a full control plane. |
| 176 | +* **Common**: |
| 177 | + [test/e2e/common](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/common). |
| 178 | + Tests compliant with both environments. |
0 commit comments