Skip to content

bug: race condition at boot selects wrong clusterNetwork conflist #1499

@kvaps

Description

@kvaps

What happened

During a Talos Linux upgrade (v1.11.3 → v1.12.6), multus daemon on one node picked 10-kube-ovn.conflist as the master CNI config instead of 05-cilium.conflist. This bypassed the Cilium CNI chain entirely — pods on the affected node had no Cilium endpoint, causing traffic to be classified as identity=world and blocked by network policies.

The issue is a race condition at node boot: when multusAutoconfigDir is set with multusConfigFile: "auto", multus picks the first conflist it finds. If kube-ovn writes 10-kube-ovn.conflist before Cilium writes 05-cilium.conflist, multus selects the wrong file.

What you expected to happen

Multus should reliably select the correct master CNI conflist, regardless of the order in which CNI plugins write their config files at boot.

How to reproduce it

  1. Configure multus in thick mode with multusAutoconfigDir and multusConfigFile: "auto"
  2. Have two CNI conflist files: 05-cilium.conflist (desired master) and 10-kube-ovn.conflist
  3. Reboot/upgrade the node such that kube-ovn writes its conflist before Cilium
  4. Observe that 00-multus.conf points to 10-kube-ovn.conflist instead of 05-cilium.conflist

Workaround

Set multusMasterCNI in daemon config to explicitly pin the master CNI file:

{
    "multusMasterCNI": "05-cilium.conflist",
    ...
}

Environment

  • Multus version: v4.2.3-thick
  • Kubernetes version: v1.32.x
  • Primary CNI: kube-ovn + cilium (chained via conflist)
  • OS: Talos Linux v1.12.6

Anything else we need to know

The issue was observed on only 1 of 6 nodes during the upgrade — likely timing-dependent. The 00-multus.conf on the affected node was dated March 28 (recreated at boot), while other nodes had older files from before the upgrade.

The findMasterPlugin function in pkg/server/config/manager.go appears to select the first conflist found in the directory. When multiple conflist files exist, the selection depends on filesystem ordering which is non-deterministic at boot time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions