Skip to content

fix(containerd): clean stale runtime state on restart to fix pod sync failures#127

Open
nova8-technologies wants to merge 2 commits intoportainer:developfrom
nova8-technologies:fix/clean-containerd-state
Open

fix(containerd): clean stale runtime state on restart to fix pod sync failures#127
nova8-technologies wants to merge 2 commits intoportainer:developfrom
nova8-technologies:fix/clean-containerd-state

Conversation

@nova8-technologies
Copy link
Copy Markdown
Contributor

Problem

After a reboot, KubeSolo fails to restart cleanly. The containerd metadata database
(meta.db in root/) retains references to EXITED containers from the previous run,
causing kubelet to fail pod synchronization with
container runtime status check may not have completed yet errors.

The previous stale socket/state cleanup was insufficient — it only removed sockets
and the state/ directory, but the metadata DB in root/ was preserved across restarts.

Solution

Add a cleanStaleState() function that runs during bootstrap, before any services
start. It cleans all containerd subdirectories except:

  • images/ — embedded tar archives (re-imported by importImages() on every startup)
  • containerd, containerd-shim-runc-v2, crun — embedded binaries (re-extracted by EnsureEmbeddedDependencies())

This gives containerd a clean slate on every startup while preserving:

  • Kine database — Kubernetes state (state.db)
  • PKI certificates — cluster identity
  • Embedded image archives — container images

Changes

  • cmd/kubesolo/main.go: Add cleanStaleState() function, called in bootstrap() before service initialization

Testing

  • Verified KubeSolo starts cleanly after reboot with persistent volume
  • Confirmed pods synchronize successfully without runtime status errors
  • Validated that Kubernetes state (kine DB) and PKI certs survive restart

Bryan Rodriguez and others added 2 commits March 25, 2026 21:34
Remove stale containerd sockets, system socket symlinks, and containerd
runtime state directory on startup. After a reboot, the old container is
killed but persistent state (sockets, shim PIDs) remains on the bind-mounted
volume, preventing containerd from starting cleanly.

Preserved: containerd images (root/), kine database (state.db), PKI certs.
Removed: stale sockets, containerd state (dead shim references).
Amp-Thread-ID: https://ampcode.com/threads/T-019ce597-0bdd-765f-b3f4-7a2e3b57a0bd
Co-authored-by: Amp <amp@ampcode.com>
The previous cleanup only removed stale sockets and the state directory.
However, the containerd metadata database (meta.db in root/) retains
references to EXITED containers from the previous run, causing kubelet
to fail pod synchronization with 'container runtime status check may
not have completed yet' errors.

Now cleans all containerd subdirectories except images/ (embedded tar
archives) and the embedded binaries (containerd, containerd-shim-runc-v2,
crun). All of these are re-extracted by EnsureEmbeddedDependencies() and
re-imported by importImages() on every startup, so no data is lost.

Preserved: kine database (Kubernetes state), PKI certificates, embedded
image archives.

Amp-Thread-ID: https://ampcode.com/threads/T-019ce597-0bdd-765f-b3f4-7a2e3b57a0bd
Co-authored-by: Amp <amp@ampcode.com>
Copy link
Copy Markdown
Member

@stevensbkang stevensbkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think this can be merged once #117 is ready so it goes crun all the way :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants