What happend:
Multus stopped working on my Talos cluster. Cluster is using flux and the thick daemonset via https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml
Additional configuration to copy the CNI binary to the host: https://www.talos.dev/v1.9/kubernetes-guides/network/multus/
Investigating the issue, the install-multus-binary init container is failing:
install-multus-binary:
Container ID: containerd://2ce3622c540260979f5b26d206a3731baaf2b97f037caaafd238853859187b23
Image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
Image ID: ghcr.io/k8snetworkplumbingwg/multus-cni@sha256:cad1ed05d89b25199697ed09723cf1260bb670ee45d8161316ea1af999fe2712
Port: <none>
Host Port: <none>
Command:
sh
-c
cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: cp: cannot stat '/usr/src/multus-cni/bin/passthru': No such file or directory
Exit Code: 1
Started: Fri, 18 Apr 2025 09:49:03 -0400
Finished: Fri, 18 Apr 2025 09:49:03 -0400
Changing the start command to remove the passthru dir CPs results in Multus running properly again.
This change seems to be from: #1419
specifically this command change:
|
- "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru" |
What you expected to happen:
Init container succeeds and multus binary is copied
How to reproduce it (as minimally and precisely as possible):
Follow instructions here to install Multus on a Talos cluster: https://www.talos.dev/v1.9/kubernetes-guides/network/multus/
Anything else we need to know?:
It is possible that the install-cni.sh script run as part of the siderolabs install-cni init container needs to be updated to work with these new paths, but wanted to start the issue here to track and hopefully get guidance
Environment:
-
Multus version : Tried 4.20 and 4.1.4
image path and image ID (from 'docker images')
Image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick Image ID: ghcr.io/k8snetworkplumbingwg/multus-cni@sha256:cad1ed05d89b25199697ed09723cf1260bb670ee45d8161316ea1af999fe2712
(also tried with stable-thick and v4.1.4-thick
-
Kubernetes version (use kubectl version): v1.29.2
-
Primary CNI for Kubernetes cluster: Cillium
-
OS (e.g. from /etc/os-release): Talos Linux
-
File of '/etc/cni/net.d/': Not able to view due to Talos not having node access
-
File of '/etc/cni/multus/net.d': Not able to view due to Talos not having node access
-
NetworkAttachment info (use kubectl get net-attach-def -o yaml)
Name: lan-network
Namespace: cams
Labels: kustomize.toolkit.fluxcd.io/name=apps
kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations: <none>
API Version: k8s.cni.cncf.io/v1
Kind: NetworkAttachmentDefinition
Metadata:
Creation Timestamp: 2024-10-20T16:37:06Z
Generation: 1
Resource Version: 86578419
UID: c2ca1a7f-443a-472e-b4b0-76c800082e63
Spec:
Config: { "cniVersion": "0.3.1", "name": "lan-network", "type": "macvlan", "mode": "bridge", "master": "enp1s0", "ipam": { "type": "host-local", "subnet": "192.168.5.0/24", "rangeStart": "192.168.5.235", "rangeEnd": "192.168.5.239", "routes": [ { "dst": "192.168.5.0/24" } ], "gateway": "192.168.5.1" } }
Events: <none>
- Target pod yaml info (with annotation, use
kubectl get pod <podname> -o yaml)
Name: scrypted-0
Namespace: cams
Priority: 0
Service Account: default
Node: node1/192.168.5.211
Start Time: Fri, 18 Apr 2025 09:26:38 -0400
Labels: app=scrypted
apps.kubernetes.io/pod-index=0
controller-revision-hash=scrypted-64757cbdfb
statefulset.kubernetes.io/pod-name=scrypted-0
Annotations: k8s.v1.cni.cncf.io/networks: [ { "name" : "lan-network", "interface": "eth1" } ]
kubectl.kubernetes.io/restartedAt: 2025-04-18T09:15:31-04:00
Status: Running
IP: 10.244.0.250
IPs:
IP: 10.244.0.250
Controlled By: StatefulSet/scrypted
Containers:
scrypted:
Container ID: containerd://21458d21defe540de61813cd67d9b71806aa3fac384c25b760ed98436606a585
Image: ghcr.io/koush/scrypted:latest
Image ID: ghcr.io/koush/scrypted@sha256:602d001ee8c1e31a22f4addb700e24d8133a8d7efef3493d6249a2e241f22b04
Ports: 11080/TCP, 10443/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Fri, 18 Apr 2025 09:32:35 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/server/volume from app (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-56xp2 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
app:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: app-scrypted-0
ReadOnly: false
kube-api-access-56xp2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned cams/scrypted-0 to node1
Warning FailedAttachVolume 37m attachdetach-controller Multi-Attach error for volume "pvc-a018f41f-d7a9-4b41-9af0-ad98e18570ca" Volume is already exclusively attached to one node and can't be attached to another
Normal SuccessfulAttachVolume 37m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-a018f41f-d7a9-4b41-9af0-ad98e18570ca"
Warning FailedCreatePodSandBox 33m kubelet Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedCreatePodSandBox 33m (x2 over 33m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "scrypted-0_cams_fd6119b0-6881-45ab-ba79-4c86ec19bc6a_0": name "scrypted-0_cams_fd6119b0-6881-45ab-ba79-4c86ec19bc6a_0" is reserved for "e701fb85fadf8c9616e511a36ab6b89ce98fe4c9770f7e596941310dbe3e5df4"
Warning FailedCreatePodSandBox 32m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b705a8ebc2e40cf8eda454f5294bd60a845fe6323d65ae9dc20f4e10992f4d02": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF: StdinData: {"clusterNetwork":"/host/etc/cni/net.d/05-cilium.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","type":"multus-shim"}
Normal Pulling 32m kubelet Pulling image "ghcr.io/koush/scrypted:latest"
Normal Pulled 32m kubelet Successfully pulled image "ghcr.io/koush/scrypted:latest" in 146ms (146ms including waiting)
Normal Created 32m kubelet Created container scrypted
Normal Started 32m kubelet Started container scrypted
- Other log outputs (if you use multus logging)
cp: cannot stat '/usr/src/multus-cni/bin/passthru': No such file or directory
Stream closed EOF for kube-system/kube-multus-ds-whxhg (install-multus-binary)
What happend:
Multus stopped working on my Talos cluster. Cluster is using flux and the thick daemonset via https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml
Additional configuration to copy the CNI binary to the host: https://www.talos.dev/v1.9/kubernetes-guides/network/multus/
Investigating the issue, the install-multus-binary init container is failing:
Changing the start command to remove the passthru dir CPs results in Multus running properly again.
This change seems to be from: #1419
specifically this command change:
multus-cni/deployments/multus-daemonset-thick.yml
Line 206 in 4517063
What you expected to happen:
Init container succeeds and multus binary is copied
How to reproduce it (as minimally and precisely as possible):
Follow instructions here to install Multus on a Talos cluster: https://www.talos.dev/v1.9/kubernetes-guides/network/multus/
Anything else we need to know?:
It is possible that the install-cni.sh script run as part of the siderolabs install-cni init container needs to be updated to work with these new paths, but wanted to start the issue here to track and hopefully get guidance
Environment:
Multus version : Tried 4.20 and 4.1.4
image path and image ID (from 'docker images')
Image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick Image ID: ghcr.io/k8snetworkplumbingwg/multus-cni@sha256:cad1ed05d89b25199697ed09723cf1260bb670ee45d8161316ea1af999fe2712
(also tried with
stable-thickandv4.1.4-thickKubernetes version (use
kubectl version): v1.29.2Primary CNI for Kubernetes cluster: Cillium
OS (e.g. from /etc/os-release): Talos Linux
File of '/etc/cni/net.d/': Not able to view due to Talos not having node access
File of '/etc/cni/multus/net.d': Not able to view due to Talos not having node access
NetworkAttachment info (use
kubectl get net-attach-def -o yaml)kubectl get pod <podname> -o yaml)