Skip to content

Network attachment fails on node: CNI request failed with status 400 (lstat /sys/bus/pci/devices/...) #693

@vdombrovski

Description

@vdombrovski

What happened?

When using the SRIOV network device plugin, after a while (not sure what causes it, but maybe something to do with node cordon/drain), my attachments fail with the following error:

Warning  FailedCreatePodSandBox  2m (x26701 over 4d15h)    kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "39c9ba85b625cf99b045e18936692738f7870a9aafcecabf31ba15bb657e35f1": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"39c9ba85b625cf99b045e18936692738f7870a9aafcecabf31ba15bb657e35f1" Netns:"/var/run/netns/cni-59fa0d72-01bd-5250-2d02-df78232a3919" IfName:"eth0" Args:"K8S_POD_NAME=virt-launcher-proxy-instance-ubuntu2404-2vcpu-4gb-g5clc;K8S_POD_INFRA_CONTAINER_ID=39c9ba85b625cf99b045e18936692738f7870a9aafcecabf31ba15bb657e35f1;K8S_POD_UID=600a2efd-658e-474a-9c75-8e5e1694c1ba;IgnoreUnknown=1;K8S_POD_NAMESPACE="test-namespace" Path:"" ERRORED: error configuring pod [test-namespace/virt-launcher-proxy-instance-ubuntu2404-2vcpu-4gb-g5clc] networking: [test-namespace/virt-launcher-proxy-instance-ubuntu2404-2vcpu-4gb-g5clc/600a2efd-658e-474a-9c75-8e5e1694c1ba:public]: error adding container to network "public": SRIOV-CNI failed to load netconf: LoadConf(): failed to get VF information: "lstat /sys/bus/pci/devices/249/physfn/net: no such file or directory"

This issue resolves itself when I restart the daemon-set and restart the appropriate VM.

kubectl rollout restart daemonset -n kube-system sriov-device-plugin-ds

When looking at /sys/bus/pci/devices/, my output looks like this:

0000:00:00.0  0000:4a:00.0  0000:4b:11.0  0000:7e:06.1	0000:7f:02.1  0000:7f:0d.0  0000:e2:00.2  0000:fe:1a.0	0000:ff:0a.3 (....)

I don't really see 249 or any integer for that matter.

What did you expect to happen?

I expected the interface to be attach

What are the minimal steps needed to reproduce the bug?

Here is my theory, not sure if this will reproduce consistently, but I've seen the issue 4-5 times already, which makes me think this is not a one-off

  1. Install multus + SRIOV daemonset + kubevirt
  2. Create the following net-attach-def
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
  name: public
  namespace: test-namespace
spec:
  config: |-
    {
        "cniVersion": "1.0.0",
        "logLevel": "info",
        "name": "public",
        "spoofchk": "off",
        "type": "sriov",
        "vlan": 103,
        "vlanQoS": 0
    }
  1. Create a VM using Kubevirt that uses the net-attach-def, here is the relevant manifest extract:
domain:
  devices:
    interfaces:
      - macAddress: 52:54:80:b2:2e:9b
        masquerade: {}
        name: default
      - macAddress: 52:54:ee:47:a3:53
        name: public
        sriov: {}
    logSerialConsole: false
networks:
  - name: default
    pod: {}
  - multus:
      networkName: test-namespace/public
    name: public
  1. The VM should start as usual.

  2. Try cordoning + draining the node on which the VM is launched. (Kubevirt will live-migrate it).

  3. Uncordon the node, and try migrating the VM on that node (or perform a stop/start, just make sure the VM is on the previously drained node). You should get this issue

Anything else we need to know?

Component Versions

Please fill in the below table with the version numbers of components used.

Component Version
SR-IOV Network Device Plugin v3.11.0
SR-IOV CNI Plugin commit 58ed04f799a67964c61f14ee87cd067ea2441b79
Multus v4.2.3
Kubernetes v1.32.11
OS debian 13

Config Files

Config file locations may be config dependent.

Device pool config file location (Try '/etc/pcidp/config.json')

N/A (file doesn't exist)

Multus config (Try '/etc/cni/multus/net.d')

N/A (not found)

CNI config (Try '/etc/cni/net.d/')
{"cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","clusterNetwork":"/host/etc/cni/net.d/05.conflist","type":"multus-shim"}
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)

Kubeadm on bare metal

Kubeconfig file

N/A, not relevant here

SR-IOV Network Custom Resource Definition

N/A, I'm not using any CRDs (direct allocation on the device plugin using the net-attach-def)

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)

See file attached

sriov-device-plugin.txt

Multus logs (If enabled. Try '/var/log/multus.log' )

N/A (see the event for exact multus-shim error)

Kubelet logs (journalctl -u kubelet)

N/A, nothing relevant in kubelet logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions