-
Notifications
You must be signed in to change notification settings - Fork 204
Description
What happened?
When using the SRIOV network device plugin, after a while (not sure what causes it, but maybe something to do with node cordon/drain), my attachments fail with the following error:
Warning FailedCreatePodSandBox 2m (x26701 over 4d15h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "39c9ba85b625cf99b045e18936692738f7870a9aafcecabf31ba15bb657e35f1": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"39c9ba85b625cf99b045e18936692738f7870a9aafcecabf31ba15bb657e35f1" Netns:"/var/run/netns/cni-59fa0d72-01bd-5250-2d02-df78232a3919" IfName:"eth0" Args:"K8S_POD_NAME=virt-launcher-proxy-instance-ubuntu2404-2vcpu-4gb-g5clc;K8S_POD_INFRA_CONTAINER_ID=39c9ba85b625cf99b045e18936692738f7870a9aafcecabf31ba15bb657e35f1;K8S_POD_UID=600a2efd-658e-474a-9c75-8e5e1694c1ba;IgnoreUnknown=1;K8S_POD_NAMESPACE="test-namespace" Path:"" ERRORED: error configuring pod [test-namespace/virt-launcher-proxy-instance-ubuntu2404-2vcpu-4gb-g5clc] networking: [test-namespace/virt-launcher-proxy-instance-ubuntu2404-2vcpu-4gb-g5clc/600a2efd-658e-474a-9c75-8e5e1694c1ba:public]: error adding container to network "public": SRIOV-CNI failed to load netconf: LoadConf(): failed to get VF information: "lstat /sys/bus/pci/devices/249/physfn/net: no such file or directory"
This issue resolves itself when I restart the daemon-set and restart the appropriate VM.
kubectl rollout restart daemonset -n kube-system sriov-device-plugin-ds
When looking at /sys/bus/pci/devices/, my output looks like this:
0000:00:00.0 0000:4a:00.0 0000:4b:11.0 0000:7e:06.1 0000:7f:02.1 0000:7f:0d.0 0000:e2:00.2 0000:fe:1a.0 0000:ff:0a.3 (....)
I don't really see 249 or any integer for that matter.
What did you expect to happen?
I expected the interface to be attach
What are the minimal steps needed to reproduce the bug?
Here is my theory, not sure if this will reproduce consistently, but I've seen the issue 4-5 times already, which makes me think this is not a one-off
- Install multus + SRIOV daemonset + kubevirt
- Create the following net-attach-def
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
name: public
namespace: test-namespace
spec:
config: |-
{
"cniVersion": "1.0.0",
"logLevel": "info",
"name": "public",
"spoofchk": "off",
"type": "sriov",
"vlan": 103,
"vlanQoS": 0
}
- Create a VM using Kubevirt that uses the net-attach-def, here is the relevant manifest extract:
domain:
devices:
interfaces:
- macAddress: 52:54:80:b2:2e:9b
masquerade: {}
name: default
- macAddress: 52:54:ee:47:a3:53
name: public
sriov: {}
logSerialConsole: false
networks:
- name: default
pod: {}
- multus:
networkName: test-namespace/public
name: public
-
The VM should start as usual.
-
Try cordoning + draining the node on which the VM is launched. (Kubevirt will live-migrate it).
-
Uncordon the node, and try migrating the VM on that node (or perform a stop/start, just make sure the VM is on the previously drained node). You should get this issue
Anything else we need to know?
Component Versions
Please fill in the below table with the version numbers of components used.
| Component | Version |
|---|---|
| SR-IOV Network Device Plugin | v3.11.0 |
| SR-IOV CNI Plugin | commit 58ed04f799a67964c61f14ee87cd067ea2441b79 |
| Multus | v4.2.3 |
| Kubernetes | v1.32.11 |
| OS | debian 13 |
Config Files
Config file locations may be config dependent.
Device pool config file location (Try '/etc/pcidp/config.json')
N/A (file doesn't exist)
Multus config (Try '/etc/cni/multus/net.d')
N/A (not found)
CNI config (Try '/etc/cni/net.d/')
{"cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","clusterNetwork":"/host/etc/cni/net.d/05.conflist","type":"multus-shim"}
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeadm on bare metal
Kubeconfig file
N/A, not relevant here
SR-IOV Network Custom Resource Definition
N/A, I'm not using any CRDs (direct allocation on the device plugin using the net-attach-def)
Logs
SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
See file attached
Multus logs (If enabled. Try '/var/log/multus.log' )
N/A (see the event for exact multus-shim error)
Kubelet logs (journalctl -u kubelet)
N/A, nothing relevant in kubelet logs.