Cilium bootstrapping fails due failure to determine direct routing device

### Summary

This issue is related to an upstream bug (1) and is often blocking k8s cluster deployments.

The symptom is Cilium pods going into CrashLoopBackOff state upon new cluster deployments:
```
$ kubectl get pod --all-namespaces
NAMESPACE     NAME                                         READY   STATUS              RESTARTS        AGE
default       csi-rbdplugin-provisioner-6f5fb97894-78wtm   0/6     Pending             0               10m
default       csi-rbdplugin-provisioner-6f5fb97894-rq9tk   0/6     Pending             0               10m
default       csi-rbdplugin-provisioner-6f5fb97894-rqjjh   0/6     Pending             0               10m
kube-system   cilium-gh6mk                                 0/1     CrashLoopBackOff    7 (8s ago)      11m
kube-system   cilium-hsj46                                 0/1     CrashLoopBackOff    7 (86s ago)     12m
kube-system   cilium-operator-7467567bb8-5d8kz             1/1     Running             1 (11m ago)     13m
kube-system   cilium-xq546                                 0/1     CrashLoopBackOff    6 (4m50s ago)   11m
kube-system   ck-storage-rawfile-csi-controller-0          0/2     Pending             0               13m
kube-system   ck-storage-rawfile-csi-node-lmt86            0/4     ContainerCreating   0               13m
kube-system   ck-storage-rawfile-csi-node-v98lj            0/4     ContainerCreating   0               11m
kube-system   ck-storage-rawfile-csi-node-xgk87            0/4     ContainerCreating   0               11m
kube-system   coredns-fc9c778db-hnb5w                      0/1     Pending             0               13m
kube-system   metrics-server-8694c96fb7-pdwsj              0/1     Pending             0               13m
```
Reading Cilium pod logs shows:
```
time="2025-09-05T19:32:31.412648669Z" level=fatal msg="failed to start: daemon creation failed: unable to determine direct routing device. Use --direct-routing-device to specify it\nfailed to stop: unable to find controller ipcache-inject-labels" subsys=daemon
```
The current workaround is to use a juju config to tell the Cilium pods which interface is handling the routing:
```
juju config k8s cluster-annotations="k8sd/v1alpha1/cilium/devices=ens+"
```
After which the pods are able to successfully bootstrap. 

This issue has been observed on on both OpenStack backed and MAAS backed deployments.


(1) https://github.com/cilium/cilium/issues/33527

### What Should Happen Instead?

The Cilium bootstrapping process should happen automatically w/o a manual intervention.

### Reproduction Steps

Example Bundle to reproduce the issue:

```yaml
default-base: ubuntu@24.04/stable
applications:
  k8s:
    charm: k8s
    channel: 1.32/stable
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    expose: true
    options:
      bootstrap-pod-cidr: 172.16.0.0/16
      bootstrap-service-cidr: 10.152.183.0/24
      ingress-enabled: true
    constraints: arch=amd64
machines:
  "0":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "1":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "2":
    constraints: arch=amd64 instance-type=c1.large zones=az1
```

Example bundle including the workaround:

```yaml
default-base: ubuntu@24.04/stable
applications:
  k8s:
    charm: k8s
    channel: 1.32/stable
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    expose: true
    options:
      bootstrap-pod-cidr: 172.16.0.0/16
      bootstrap-service-cidr: 10.152.183.0/24
      cluster-annotations: k8sd/v1alpha1/cilium/devices=ens+
      ingress-enabled: true
    constraints: arch=amd64
machines:
  "0":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "1":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "2":
    constraints: arch=amd64 instance-type=c1.large zones=az1
```

Please change `devices=ens+` to match target NIC names.

### System information

This is reproducible on k8s 1.32 LTS on Ubuntu 22.04-LTS as well as Ubuntu 24.04-LTS. 

As stated above, MAAS backed and OpenStack backed deployments show the same behavior. 

### Can you suggest a fix?

As suggested on the upstream bug (1), it would be nice if we can detect the physical NICs and configure this automatically for Cilium to bootstrap.


(1) https://github.com/cilium/cilium/issues/33527

### Are you interested in contributing with a fix?

Yes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cilium bootstrapping fails due failure to determine direct routing device #1828

Summary

What Should Happen Instead?

Reproduction Steps

System information

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cilium bootstrapping fails due failure to determine direct routing device #1828

Description

Summary

What Should Happen Instead?

Reproduction Steps

System information

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions