Cluster status is show as `ready` even when Cilium pods are crash-looping

### Summary

This isn't happening everywhere, but there's a specific machine in Testflinger where this happens consistently (see details for it below).

Essentially, even when `sudo k8s status --wait-ready --timeout 10m` reports that the `cluster status` is `ready`, inspecting `sudo k8s kubectl get po -A` shows that Cilium pods are in `CrashLoopBackOff`.

The Cilium pods do not deploy, unless I restart `k8s` using `sudo snap restart k8s`, and all pods start working after that.

### What Should Happen Instead?

`sudo k8s status --wait-ready --timeout 10m` must not report cluster status as `ready` if the Cilium pods are crash-looping.

### Reproduction Steps

```console
$ sudo snap install k8s --classic --channel 1.32-classic/stable
Bootstrapping the cluster. This may take a few seconds, please wait.
Bootstrapped a new Kubernetes cluster with node address "XX.XX.XX.219:6400".
The node will be 'Ready' to host workloads after the CNI is deployed successfully.
$ sudo k8s enable local-storage --timeout 5m
Enabling local-storage on the cluster. This may take a few seconds, please wait.
local-storage enabled.
$ sudo k8s status --wait-ready --timeout 10m
cluster status:           ready
control plane nodes:      XX.XX.XX.219:6400 (voter)
high availability:        no
datastore:                etcd
network:                  enabled
dns:                      enabled at XX.XX.XX.197
ingress:                  disabled
load-balancer:            disabled
local-storage:            enabled at /var/snap/k8s/common/rawfile-storage
gateway                   enabled
$ sudo k8s kubectl get po -A
NAMESPACE     NAME                                  READY   STATUS              RESTARTS       AGE
kube-system   cilium-285ff                          0/1     CrashLoopBackOff    5 (2m9s ago)   6m46s
kube-system   cilium-operator-6777577c56-4ppcr      0/1     CrashLoopBackOff    6 (67s ago)    8m12s
kube-system   ck-storage-rawfile-csi-controller-0   0/2     Pending             0              8m19s
kube-system   ck-storage-rawfile-csi-node-blsnd     0/4     ContainerCreating   0              8m19s
kube-system   coredns-fc9c778db-twql4               0/1     Pending             0              8m19s
kube-system   metrics-server-8694c96fb7-c8kfw       0/1     Pending             0              8m19s
$ sudo k8s inspect
Collecting service information
Running inspection on a control-plane node
 INFO:  Service k8s.containerd is running
 INFO:  Service k8s.etcd is running
 INFO:  Service k8s.kube-proxy is running
 INFO:  Service k8s.k8s-dqlite is not running
 WARNING:  Service k8s.k8s-dqlite should be running on this node
 INFO:  Service k8s.k8sd is running
 INFO:  Service k8s.kube-apiserver is running
 INFO:  Service k8s.kube-controller-manager is running
 INFO:  Service k8s.kube-scheduler is running
 INFO:  Service k8s.kubelet is running
Collecting registry mirror logs
Collecting service arguments
 INFO:  Copy service args to the final report tarball
Collecting k8s cluster-info
 INFO:  Copy k8s cluster-info dump to the final report tarball
Collecting SBOM
 INFO:  Copy SBOM to the final report tarball
Collecting system information
 INFO:  Copy processes list to the final report tarball
 INFO:  Copy disk usage information to the final report tarball
 INFO:  Copy /proc/mounts to the final report tarball
 INFO:  Copy memory usage information to the final report tarball
 INFO:  Copy swap information to the final report tarball
 INFO:  Copy node uptime to the final report tarball
 INFO:  Copy /etc/os-release to the final report tarball
 INFO:  Copy loaded kernel modules to the final report tarball
 INFO:  Copy dmesg entries
 INFO:  Collecting core dumps from /var/crash. Size: 1.2M	/var/crash
Collecting snap and related information
 INFO:  Copy uname to the final report tarball
 INFO:  Copy snap diagnostics to the final report tarball
 INFO:  Copy k8s diagnostics to the final report tarball
cp: cannot stat '/var/snap/k8s/common/var/lib/k8s-dqlite/cluster.yaml': No such file or directory
cp: cannot stat '/var/snap/k8s/common/var/lib/k8s-dqlite/info.yaml': No such file or directory
Collecting networking information
 INFO:  Copy network diagnostics to the final report tarball
Building the report tarball
 SUCCESS:  Report tarball is at /home/ubuntu/inspection-report-20250826_171132.tar.gz
```

### System information

While testing on multiple machines, I have found this issue happen consistently in [one particular machine in Testflinger](https://testflinger.canonical.com/queues/dell-precision-5680-c31665).  Please see [this run in Testflinger](https://testflinger.canonical.com/jobs/5f80929e-9567-4e07-81cf-932726c9dc8e).

I am attaching the report from `sudo k8s inspect`.

[inspection-report-20250826_171132.tar.gz](https://github.com/user-attachments/files/21985164/inspection-report-20250826_171132.tar.gz)

### Can you suggest a fix?

_No response_

### Are you interested in contributing with a fix?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster status is show as `ready` even when Cilium pods are crash-looping #1789

Summary

What Should Happen Instead?

Reproduction Steps

System information

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster status is show as ready even when Cilium pods are crash-looping #1789

Description

Summary

What Should Happen Instead?

Reproduction Steps

System information

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Cluster status is show as `ready` even when Cilium pods are crash-looping #1789