Skip to content

Latest change to Thick Image deployment.yaml breaks Multus on Talos Cluster #1422

@SaberSHO

Description

@SaberSHO

What happend:
Multus stopped working on my Talos cluster. Cluster is using flux and the thick daemonset via https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml

Additional configuration to copy the CNI binary to the host: https://www.talos.dev/v1.9/kubernetes-guides/network/multus/

Investigating the issue, the install-multus-binary init container is failing:

install-multus-binary:
    Container ID:  containerd://2ce3622c540260979f5b26d206a3731baaf2b97f037caaafd238853859187b23
    Image:         ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
    Image ID:      ghcr.io/k8snetworkplumbingwg/multus-cni@sha256:cad1ed05d89b25199697ed09723cf1260bb670ee45d8161316ea1af999fe2712
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   cp: cannot stat '/usr/src/multus-cni/bin/passthru': No such file or directory

      Exit Code:    1
      Started:      Fri, 18 Apr 2025 09:49:03 -0400
      Finished:     Fri, 18 Apr 2025 09:49:03 -0400

Changing the start command to remove the passthru dir CPs results in Multus running properly again.

This change seems to be from: #1419
specifically this command change:

- "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"

What you expected to happen:
Init container succeeds and multus binary is copied

How to reproduce it (as minimally and precisely as possible):
Follow instructions here to install Multus on a Talos cluster: https://www.talos.dev/v1.9/kubernetes-guides/network/multus/

Anything else we need to know?:
It is possible that the install-cni.sh script run as part of the siderolabs install-cni init container needs to be updated to work with these new paths, but wanted to start the issue here to track and hopefully get guidance

Environment:

  • Multus version : Tried 4.20 and 4.1.4
    image path and image ID (from 'docker images')
    Image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick Image ID: ghcr.io/k8snetworkplumbingwg/multus-cni@sha256:cad1ed05d89b25199697ed09723cf1260bb670ee45d8161316ea1af999fe2712
    (also tried with stable-thick and v4.1.4-thick

  • Kubernetes version (use kubectl version): v1.29.2

  • Primary CNI for Kubernetes cluster: Cillium

  • OS (e.g. from /etc/os-release): Talos Linux

  • File of '/etc/cni/net.d/': Not able to view due to Talos not having node access

  • File of '/etc/cni/multus/net.d': Not able to view due to Talos not having node access

  • NetworkAttachment info (use kubectl get net-attach-def -o yaml)

Name:         lan-network
Namespace:    cams
Labels:       kustomize.toolkit.fluxcd.io/name=apps
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  <none>
API Version:  k8s.cni.cncf.io/v1
Kind:         NetworkAttachmentDefinition
Metadata:
  Creation Timestamp:  2024-10-20T16:37:06Z
  Generation:          1
  Resource Version:    86578419
  UID:                 c2ca1a7f-443a-472e-b4b0-76c800082e63
Spec:
  Config:  { "cniVersion": "0.3.1", "name": "lan-network", "type": "macvlan", "mode": "bridge", "master": "enp1s0", "ipam": { "type": "host-local", "subnet": "192.168.5.0/24", "rangeStart": "192.168.5.235", "rangeEnd": "192.168.5.239", "routes": [ { "dst": "192.168.5.0/24" } ], "gateway": "192.168.5.1" } }
Events:    <none>
  • Target pod yaml info (with annotation, use kubectl get pod <podname> -o yaml)
Name:             scrypted-0
Namespace:        cams
Priority:         0
Service Account:  default
Node:             node1/192.168.5.211
Start Time:       Fri, 18 Apr 2025 09:26:38 -0400
Labels:           app=scrypted
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=scrypted-64757cbdfb
                  statefulset.kubernetes.io/pod-name=scrypted-0
Annotations:      k8s.v1.cni.cncf.io/networks: [ { "name" : "lan-network", "interface": "eth1" } ]
                  kubectl.kubernetes.io/restartedAt: 2025-04-18T09:15:31-04:00
Status:           Running
IP:               10.244.0.250
IPs:
  IP:           10.244.0.250
Controlled By:  StatefulSet/scrypted
Containers:
  scrypted:
    Container ID:   containerd://21458d21defe540de61813cd67d9b71806aa3fac384c25b760ed98436606a585
    Image:          ghcr.io/koush/scrypted:latest
    Image ID:       ghcr.io/koush/scrypted@sha256:602d001ee8c1e31a22f4addb700e24d8133a8d7efef3493d6249a2e241f22b04
    Ports:          11080/TCP, 10443/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Fri, 18 Apr 2025 09:32:35 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /server/volume from app (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-56xp2 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  app:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  app-scrypted-0
    ReadOnly:   false
  kube-api-access-56xp2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               37m                default-scheduler        Successfully assigned cams/scrypted-0 to node1
  Warning  FailedAttachVolume      37m                attachdetach-controller  Multi-Attach error for volume "pvc-a018f41f-d7a9-4b41-9af0-ad98e18570ca" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulAttachVolume  37m                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-a018f41f-d7a9-4b41-9af0-ad98e18570ca"
  Warning  FailedCreatePodSandBox  33m                kubelet                  Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedCreatePodSandBox  33m (x2 over 33m)  kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "scrypted-0_cams_fd6119b0-6881-45ab-ba79-4c86ec19bc6a_0": name "scrypted-0_cams_fd6119b0-6881-45ab-ba79-4c86ec19bc6a_0" is reserved for "e701fb85fadf8c9616e511a36ab6b89ce98fe4c9770f7e596941310dbe3e5df4"
  Warning  FailedCreatePodSandBox  32m                kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b705a8ebc2e40cf8eda454f5294bd60a845fe6323d65ae9dc20f4e10992f4d02": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF: StdinData: {"clusterNetwork":"/host/etc/cni/net.d/05-cilium.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","type":"multus-shim"}
  Normal   Pulling                 32m                kubelet                  Pulling image "ghcr.io/koush/scrypted:latest"
  Normal   Pulled                  32m                kubelet                  Successfully pulled image "ghcr.io/koush/scrypted:latest" in 146ms (146ms including waiting)
  Normal   Created                 32m                kubelet                  Created container scrypted
  Normal   Started                 32m                kubelet                  Started container scrypted
  • Other log outputs (if you use multus logging)
cp: cannot stat '/usr/src/multus-cni/bin/passthru': No such file or directory
Stream closed EOF for kube-system/kube-multus-ds-whxhg (install-multus-binary)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions