-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Summary
This isn't happening everywhere, but there's a specific machine in Testflinger where this happens consistently (see details for it below).
Essentially, even when sudo k8s status --wait-ready --timeout 10m reports that the cluster status is ready, inspecting sudo k8s kubectl get po -A shows that Cilium pods are in CrashLoopBackOff.
The Cilium pods do not deploy, unless I restart k8s using sudo snap restart k8s, and all pods start working after that.
What Should Happen Instead?
sudo k8s status --wait-ready --timeout 10m must not report cluster status as ready if the Cilium pods are crash-looping.
Reproduction Steps
$ sudo snap install k8s --classic --channel 1.32-classic/stable
Bootstrapping the cluster. This may take a few seconds, please wait.
Bootstrapped a new Kubernetes cluster with node address "XX.XX.XX.219:6400".
The node will be 'Ready' to host workloads after the CNI is deployed successfully.
$ sudo k8s enable local-storage --timeout 5m
Enabling local-storage on the cluster. This may take a few seconds, please wait.
local-storage enabled.
$ sudo k8s status --wait-ready --timeout 10m
cluster status: ready
control plane nodes: XX.XX.XX.219:6400 (voter)
high availability: no
datastore: etcd
network: enabled
dns: enabled at XX.XX.XX.197
ingress: disabled
load-balancer: disabled
local-storage: enabled at /var/snap/k8s/common/rawfile-storage
gateway enabled
$ sudo k8s kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-285ff 0/1 CrashLoopBackOff 5 (2m9s ago) 6m46s
kube-system cilium-operator-6777577c56-4ppcr 0/1 CrashLoopBackOff 6 (67s ago) 8m12s
kube-system ck-storage-rawfile-csi-controller-0 0/2 Pending 0 8m19s
kube-system ck-storage-rawfile-csi-node-blsnd 0/4 ContainerCreating 0 8m19s
kube-system coredns-fc9c778db-twql4 0/1 Pending 0 8m19s
kube-system metrics-server-8694c96fb7-c8kfw 0/1 Pending 0 8m19s
$ sudo k8s inspect
Collecting service information
Running inspection on a control-plane node
INFO: Service k8s.containerd is running
INFO: Service k8s.etcd is running
INFO: Service k8s.kube-proxy is running
INFO: Service k8s.k8s-dqlite is not running
WARNING: Service k8s.k8s-dqlite should be running on this node
INFO: Service k8s.k8sd is running
INFO: Service k8s.kube-apiserver is running
INFO: Service k8s.kube-controller-manager is running
INFO: Service k8s.kube-scheduler is running
INFO: Service k8s.kubelet is running
Collecting registry mirror logs
Collecting service arguments
INFO: Copy service args to the final report tarball
Collecting k8s cluster-info
INFO: Copy k8s cluster-info dump to the final report tarball
Collecting SBOM
INFO: Copy SBOM to the final report tarball
Collecting system information
INFO: Copy processes list to the final report tarball
INFO: Copy disk usage information to the final report tarball
INFO: Copy /proc/mounts to the final report tarball
INFO: Copy memory usage information to the final report tarball
INFO: Copy swap information to the final report tarball
INFO: Copy node uptime to the final report tarball
INFO: Copy /etc/os-release to the final report tarball
INFO: Copy loaded kernel modules to the final report tarball
INFO: Copy dmesg entries
INFO: Collecting core dumps from /var/crash. Size: 1.2M /var/crash
Collecting snap and related information
INFO: Copy uname to the final report tarball
INFO: Copy snap diagnostics to the final report tarball
INFO: Copy k8s diagnostics to the final report tarball
cp: cannot stat '/var/snap/k8s/common/var/lib/k8s-dqlite/cluster.yaml': No such file or directory
cp: cannot stat '/var/snap/k8s/common/var/lib/k8s-dqlite/info.yaml': No such file or directory
Collecting networking information
INFO: Copy network diagnostics to the final report tarball
Building the report tarball
SUCCESS: Report tarball is at /home/ubuntu/inspection-report-20250826_171132.tar.gzSystem information
While testing on multiple machines, I have found this issue happen consistently in one particular machine in Testflinger. Please see this run in Testflinger.
I am attaching the report from sudo k8s inspect.
inspection-report-20250826_171132.tar.gz
Can you suggest a fix?
No response
Are you interested in contributing with a fix?
No response