What happened
During a Talos Linux upgrade (v1.11.3 → v1.12.6), multus daemon on one node picked 10-kube-ovn.conflist as the master CNI config instead of 05-cilium.conflist. This bypassed the Cilium CNI chain entirely — pods on the affected node had no Cilium endpoint, causing traffic to be classified as identity=world and blocked by network policies.
The issue is a race condition at node boot: when multusAutoconfigDir is set with multusConfigFile: "auto", multus picks the first conflist it finds. If kube-ovn writes 10-kube-ovn.conflist before Cilium writes 05-cilium.conflist, multus selects the wrong file.
What you expected to happen
Multus should reliably select the correct master CNI conflist, regardless of the order in which CNI plugins write their config files at boot.
How to reproduce it
- Configure multus in thick mode with
multusAutoconfigDir and multusConfigFile: "auto"
- Have two CNI conflist files:
05-cilium.conflist (desired master) and 10-kube-ovn.conflist
- Reboot/upgrade the node such that kube-ovn writes its conflist before Cilium
- Observe that
00-multus.conf points to 10-kube-ovn.conflist instead of 05-cilium.conflist
Workaround
Set multusMasterCNI in daemon config to explicitly pin the master CNI file:
{
"multusMasterCNI": "05-cilium.conflist",
...
}
Environment
- Multus version: v4.2.3-thick
- Kubernetes version: v1.32.x
- Primary CNI: kube-ovn + cilium (chained via conflist)
- OS: Talos Linux v1.12.6
Anything else we need to know
The issue was observed on only 1 of 6 nodes during the upgrade — likely timing-dependent. The 00-multus.conf on the affected node was dated March 28 (recreated at boot), while other nodes had older files from before the upgrade.
The findMasterPlugin function in pkg/server/config/manager.go appears to select the first conflist found in the directory. When multiple conflist files exist, the selection depends on filesystem ordering which is non-deterministic at boot time.
What happened
During a Talos Linux upgrade (v1.11.3 → v1.12.6), multus daemon on one node picked
10-kube-ovn.conflistas the master CNI config instead of05-cilium.conflist. This bypassed the Cilium CNI chain entirely — pods on the affected node had no Cilium endpoint, causing traffic to be classified asidentity=worldand blocked by network policies.The issue is a race condition at node boot: when
multusAutoconfigDiris set withmultusConfigFile: "auto", multus picks the first conflist it finds. If kube-ovn writes10-kube-ovn.conflistbefore Cilium writes05-cilium.conflist, multus selects the wrong file.What you expected to happen
Multus should reliably select the correct master CNI conflist, regardless of the order in which CNI plugins write their config files at boot.
How to reproduce it
multusAutoconfigDirandmultusConfigFile: "auto"05-cilium.conflist(desired master) and10-kube-ovn.conflist00-multus.confpoints to10-kube-ovn.conflistinstead of05-cilium.conflistWorkaround
Set
multusMasterCNIin daemon config to explicitly pin the master CNI file:{ "multusMasterCNI": "05-cilium.conflist", ... }Environment
Anything else we need to know
The issue was observed on only 1 of 6 nodes during the upgrade — likely timing-dependent. The
00-multus.confon the affected node was dated March 28 (recreated at boot), while other nodes had older files from before the upgrade.The
findMasterPluginfunction inpkg/server/config/manager.goappears to select the first conflist found in the directory. When multiple conflist files exist, the selection depends on filesystem ordering which is non-deterministic at boot time.