A node has zombie processes when one or more processes are in the zombie (defunct) state (state Z in /proc). Zombie processes have exited but their parent has not reaped them; they consume a PID slot and, in large numbers, can contribute to PID exhaustion (PIDPressure) or indicate buggy or stuck parent processes.
Warning
N/A
- Report shows: Node <name> has N zombie process(es)
- Node Inspection "Node process health" table shows a non-zero zombie count and issue code NODE-003
- On the node,
psor inspection of/proc/*/statshows processes in state Z
- Identify zombie processes on the node:
ps aux | grep Zor inspect/proc/<pid>/stat(second field is state). - Find the parent process (PPID) of each zombie; the parent is responsible for reaping. If the parent is a system component (e.g. kubelet, container runtime), check its logs and consider restarting it under controlled maintenance.
- If many zombies accumulate, check for PID limits (
podPidsLimit, node capacity) and consider tuning or fixing the parent process to reap children correctly. - Restarting the parent process will reap zombies; avoid killing zombie processes directly (they cannot be signalled).