fix(containerd): clean stale runtime state on restart to fix pod sync failures by nova8-technologies · Pull Request #127 · portainer/kubesolo

nova8-technologies · 2026-03-25T21:56:51Z

Problem

After a reboot, KubeSolo fails to restart cleanly. The containerd metadata database
(meta.db in root/) retains references to EXITED containers from the previous run,
causing kubelet to fail pod synchronization with
container runtime status check may not have completed yet errors.

The previous stale socket/state cleanup was insufficient — it only removed sockets
and the state/ directory, but the metadata DB in root/ was preserved across restarts.

Solution

Add a cleanStaleState() function that runs during bootstrap, before any services
start. It cleans all containerd subdirectories except:

images/ — embedded tar archives (re-imported by importImages() on every startup)
containerd, containerd-shim-runc-v2, crun — embedded binaries (re-extracted by EnsureEmbeddedDependencies())

This gives containerd a clean slate on every startup while preserving:

Kine database — Kubernetes state (state.db)
PKI certificates — cluster identity
Embedded image archives — container images

Changes

cmd/kubesolo/main.go: Add cleanStaleState() function, called in bootstrap() before service initialization

Testing

Verified KubeSolo starts cleanly after reboot with persistent volume
Confirmed pods synchronize successfully without runtime status errors
Validated that Kubernetes state (kine DB) and PKI certs survive restart

Remove stale containerd sockets, system socket symlinks, and containerd runtime state directory on startup. After a reboot, the old container is killed but persistent state (sockets, shim PIDs) remains on the bind-mounted volume, preventing containerd from starting cleanly. Preserved: containerd images (root/), kine database (state.db), PKI certs. Removed: stale sockets, containerd state (dead shim references). Amp-Thread-ID: https://ampcode.com/threads/T-019ce597-0bdd-765f-b3f4-7a2e3b57a0bd Co-authored-by: Amp <amp@ampcode.com>

The previous cleanup only removed stale sockets and the state directory. However, the containerd metadata database (meta.db in root/) retains references to EXITED containers from the previous run, causing kubelet to fail pod synchronization with 'container runtime status check may not have completed yet' errors. Now cleans all containerd subdirectories except images/ (embedded tar archives) and the embedded binaries (containerd, containerd-shim-runc-v2, crun). All of these are re-extracted by EnsureEmbeddedDependencies() and re-imported by importImages() on every startup, so no data is lost. Preserved: kine database (Kubernetes state), PKI certificates, embedded image archives. Amp-Thread-ID: https://ampcode.com/threads/T-019ce597-0bdd-765f-b3f4-7a2e3b57a0bd Co-authored-by: Amp <amp@ampcode.com>

stevensbkang

LGTM! I think this can be merged once #117 is ready so it goes crun all the way :)

Bryan Rodriguez and others added 2 commits March 25, 2026 21:34

nova8-technologies requested a review from stevensbkang as a code owner March 25, 2026 21:56

stevensbkang approved these changes Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(containerd): clean stale runtime state on restart to fix pod sync failures#127

fix(containerd): clean stale runtime state on restart to fix pod sync failures#127
nova8-technologies wants to merge 2 commits intoportainer:developfrom
nova8-technologies:fix/clean-containerd-state

nova8-technologies commented Mar 25, 2026

Uh oh!

stevensbkang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nova8-technologies commented Mar 25, 2026

Problem

Solution

Changes

Testing

Uh oh!

stevensbkang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants