Skip to content

Ubuntu 24.04: cloud-init hotplug reconfigures netplan when Cilium attaches ENIs, breaking BPF masquerade #17881

@yilmaz-burak

Description

@yilmaz-burak

What happened?

On Ubuntu 24.04 (Noble) with Cilium networking, cloud-init's network hotplug feature detects when Cilium dynamically attaches secondary ENIs and regenerates /etc/netplan/*.yaml with full policy-based routing (PBR). This adds routes for secondary interfaces to the main routing table, breaking Cilium's BPF masquerade functionality.

This issue does not occur on Ubuntu 22.04 because hotplug is disabled by default on that image.

Environment

  • kOps version: 1.34.1
  • Kubernetes version: 1.34.3
  • Cilium version: 1.18.2
  • Cloud provider: AWS
  • OS Image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-arm64-server-20251212
  • Instance type: m8g.large (ARM64)

Steps to reproduce

  1. Create a kOps cluster with Cilium networking on Ubuntu 24.04
  2. Wait for nodes to be ready and Cilium to attach secondary ENIs
  3. SSH into a node and check:
# Check netplan - will show secondary ENI with full PBR
cat /etc/netplan/*.yaml

# Check routes - will show multiple default routes
ip route

# Check cloud-init logs - will show hotplug triggered
grep -i hotplug /var/log/cloud-init.log

Expected behavior

Netplan should only contain the primary ENI configuration. Cilium manages secondary ENIs directly and does not require (and is broken by) OS-level route management.

Expected ip route output:

default via 10.20.96.1 dev ens5 proto dhcp src 10.20.99.245 metric 100

Actual behavior

Cloud-init hotplug handler detects the ENI attachment and reconfigures netplan:

Actual /etc/netplan/*.yaml:

network:
  version: 2
  ethernets:
    ens5:
      dhcp4: true
      dhcp4-overrides:
        route-metric: 100
    ens6:
      dhcp4: true
      dhcp4-overrides:
        route-metric: 200
      routes:
      - table: 101
        to: "0.0.0.0/0"
        via: "10.20.32.1"
      routing-policy:
      - table: 101
        from: "10.20.63.212"

Actual ip route output:

default via 10.20.32.1 dev ens5 proto dhcp src 10.20.41.26 metric 100
default via 10.20.32.1 dev ens6 proto dhcp src 10.20.63.212 metric 200  # <-- breaks masquerade

cloud-init.log shows:

stages.py[DEBUG]: Event Allowed: scope=network EventType=hotplug
cc_install_hotplug.py[INFO]: Installing hotplug.
hotplug-hook called with: {subsystem: net, udevaction: add, devpath: .../net/ens6}

Why Ubuntu 22.04 works

On Ubuntu 22.04, cloud-init logs show:

stages.py[DEBUG]: Event Denied: scopes=['network'] EventType=hotplug
cc_install_hotplug.py[DEBUG]: Skipping hotplug install, not enabled

The Ubuntu 22.04 cloud image has network hotplug disabled by default.

Root cause

Ubuntu 24.04 cloud images enable cloud-init network hotplug by default. This was introduced in cloud-init PR #4799 (Feb 2024) to add automatic PBR for EC2 instances with multiple NICs.

However, this conflicts with CNI plugins like Cilium that manage secondary ENIs directly. The Cilium ENI documentation explicitly states:

"The IP address and routes on ENIs attached to the instance will be managed by the Cilium agent. Therefore, any system service trying to manage newly attached network interfaces will interfere with Cilium's configuration."

Current workaround

Users can disable hotplug via additionalUserData in each InstanceGroup:

spec:
  additionalUserData:
  - content: |
      #cloud-config
      updates:
        network:
          when:
            - boot-new-instance
    name: 00-disable-hotplug.cfg
    type: text/cloud-config

Proposed fix

kOps should automatically disable cloud-init network hotplug when using Cilium (or Amazon VPC CNI) on Ubuntu 24.04+. This is similar to PR #17438 which added systemd-networkd configuration to prevent route removal.

Suggested implementation:

  1. When networking.cilium or networking.amazonvpc is configured
  2. And the OS image is Ubuntu 24.04+
  3. Automatically add cloud-init config to disable network hotplug

Example cloud-init config to add:

#cloud-config
updates:
  network:
    when:
      - boot-new-instance

Related issues

Additional context

This issue will affect all kOps users who:

  • Use Ubuntu 24.04 (Noble) images
  • Use Cilium or Amazon VPC CNI
  • Have instances that receive secondary ENIs

/kind bug
/area networking
/area provider/aws

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/networkingarea/provider/awsIssues or PRs related to aws providerkind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions