fix(containerd): clean stale runtime state on restart to fix pod sync failures#127
Open
nova8-technologies wants to merge 2 commits intoportainer:developfrom
Open
Conversation
Remove stale containerd sockets, system socket symlinks, and containerd runtime state directory on startup. After a reboot, the old container is killed but persistent state (sockets, shim PIDs) remains on the bind-mounted volume, preventing containerd from starting cleanly. Preserved: containerd images (root/), kine database (state.db), PKI certs. Removed: stale sockets, containerd state (dead shim references). Amp-Thread-ID: https://ampcode.com/threads/T-019ce597-0bdd-765f-b3f4-7a2e3b57a0bd Co-authored-by: Amp <amp@ampcode.com>
The previous cleanup only removed stale sockets and the state directory. However, the containerd metadata database (meta.db in root/) retains references to EXITED containers from the previous run, causing kubelet to fail pod synchronization with 'container runtime status check may not have completed yet' errors. Now cleans all containerd subdirectories except images/ (embedded tar archives) and the embedded binaries (containerd, containerd-shim-runc-v2, crun). All of these are re-extracted by EnsureEmbeddedDependencies() and re-imported by importImages() on every startup, so no data is lost. Preserved: kine database (Kubernetes state), PKI certificates, embedded image archives. Amp-Thread-ID: https://ampcode.com/threads/T-019ce597-0bdd-765f-b3f4-7a2e3b57a0bd Co-authored-by: Amp <amp@ampcode.com>
stevensbkang
approved these changes
Mar 31, 2026
Member
stevensbkang
left a comment
There was a problem hiding this comment.
LGTM! I think this can be merged once #117 is ready so it goes crun all the way :)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After a reboot, KubeSolo fails to restart cleanly. The containerd metadata database
(meta.db in root/) retains references to EXITED containers from the previous run,
causing kubelet to fail pod synchronization with
container runtime status check may not have completed yeterrors.The previous stale socket/state cleanup was insufficient — it only removed sockets
and the
state/directory, but the metadata DB inroot/was preserved across restarts.Solution
Add a
cleanStaleState()function that runs during bootstrap, before any servicesstart. It cleans all containerd subdirectories except:
images/— embedded tar archives (re-imported byimportImages()on every startup)containerd,containerd-shim-runc-v2,crun— embedded binaries (re-extracted byEnsureEmbeddedDependencies())This gives containerd a clean slate on every startup while preserving:
Changes
cmd/kubesolo/main.go: AddcleanStaleState()function, called inbootstrap()before service initializationTesting