Skip to content

Cilium bootstrapping fails due failure to determine direct routing device #1828

@alanbach

Description

@alanbach

Summary

This issue is related to an upstream bug (1) and is often blocking k8s cluster deployments.

The symptom is Cilium pods going into CrashLoopBackOff state upon new cluster deployments:

$ kubectl get pod --all-namespaces
NAMESPACE     NAME                                         READY   STATUS              RESTARTS        AGE
default       csi-rbdplugin-provisioner-6f5fb97894-78wtm   0/6     Pending             0               10m
default       csi-rbdplugin-provisioner-6f5fb97894-rq9tk   0/6     Pending             0               10m
default       csi-rbdplugin-provisioner-6f5fb97894-rqjjh   0/6     Pending             0               10m
kube-system   cilium-gh6mk                                 0/1     CrashLoopBackOff    7 (8s ago)      11m
kube-system   cilium-hsj46                                 0/1     CrashLoopBackOff    7 (86s ago)     12m
kube-system   cilium-operator-7467567bb8-5d8kz             1/1     Running             1 (11m ago)     13m
kube-system   cilium-xq546                                 0/1     CrashLoopBackOff    6 (4m50s ago)   11m
kube-system   ck-storage-rawfile-csi-controller-0          0/2     Pending             0               13m
kube-system   ck-storage-rawfile-csi-node-lmt86            0/4     ContainerCreating   0               13m
kube-system   ck-storage-rawfile-csi-node-v98lj            0/4     ContainerCreating   0               11m
kube-system   ck-storage-rawfile-csi-node-xgk87            0/4     ContainerCreating   0               11m
kube-system   coredns-fc9c778db-hnb5w                      0/1     Pending             0               13m
kube-system   metrics-server-8694c96fb7-pdwsj              0/1     Pending             0               13m

Reading Cilium pod logs shows:

time="2025-09-05T19:32:31.412648669Z" level=fatal msg="failed to start: daemon creation failed: unable to determine direct routing device. Use --direct-routing-device to specify it\nfailed to stop: unable to find controller ipcache-inject-labels" subsys=daemon

The current workaround is to use a juju config to tell the Cilium pods which interface is handling the routing:

juju config k8s cluster-annotations="k8sd/v1alpha1/cilium/devices=ens+"

After which the pods are able to successfully bootstrap.

This issue has been observed on on both OpenStack backed and MAAS backed deployments.

(1) cilium/cilium#33527

What Should Happen Instead?

The Cilium bootstrapping process should happen automatically w/o a manual intervention.

Reproduction Steps

Example Bundle to reproduce the issue:

default-base: [email protected]/stable
applications:
  k8s:
    charm: k8s
    channel: 1.32/stable
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    expose: true
    options:
      bootstrap-pod-cidr: 172.16.0.0/16
      bootstrap-service-cidr: 10.152.183.0/24
      ingress-enabled: true
    constraints: arch=amd64
machines:
  "0":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "1":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "2":
    constraints: arch=amd64 instance-type=c1.large zones=az1

Example bundle including the workaround:

default-base: [email protected]/stable
applications:
  k8s:
    charm: k8s
    channel: 1.32/stable
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    expose: true
    options:
      bootstrap-pod-cidr: 172.16.0.0/16
      bootstrap-service-cidr: 10.152.183.0/24
      cluster-annotations: k8sd/v1alpha1/cilium/devices=ens+
      ingress-enabled: true
    constraints: arch=amd64
machines:
  "0":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "1":
    constraints: arch=amd64 instance-type=c1.large zones=az1
  "2":
    constraints: arch=amd64 instance-type=c1.large zones=az1

Please change devices=ens+ to match target NIC names.

System information

This is reproducible on k8s 1.32 LTS on Ubuntu 22.04-LTS as well as Ubuntu 24.04-LTS.

As stated above, MAAS backed and OpenStack backed deployments show the same behavior.

Can you suggest a fix?

As suggested on the upstream bug (1), it would be nice if we can detect the physical NICs and configure this automatically for Cilium to bootstrap.

(1) cilium/cilium#33527

Are you interested in contributing with a fix?

Yes!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions