Skip to content

CDI would be out-of-date when the device resources is allocated #633

@cyclinder

Description

@cyclinder

What happened?

the CDI file would be out-of-date when the device resources are allocated.

I0306 11:20:42.972829       1 server.go:127] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:0b:00.2],},},}
I0306 11:20:42.972935       1 pool_stub.go:108] GetEnvs(): for devices: [0000:0b:00.2]
I0306 11:20:42.978145       1 server.go:159] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_SPIDERNET_IO_SRIVO_ETH_P4: 0000:0b:00.2,PCIDEVICE_SPIDERNET_IO_SRIVO_ETH_P4_INFO: {"0000:0b:00.2":{"generic":{"deviceID":"0000:0b:00.2"},"rdma":{"rdma_cm":"/dev/infiniband/rdma_cm","umad":"/dev/infiniband/umad10","uverbs":"/dev/infiniband/uverbs10"}}},},Mounts:[]*Mount{},Devices:[]*DeviceSpec{},Annotations:map[string]string{cdi.k8s.io/spidernet.io_net-pci: spidernet.io/net-pci=0000:0b:00.2,},CDIDevices:[]*CDIDevice{},},},}

root@10-20-1-50:/var/run/cdi# cat sriov-dp-spidernet.io-net-pci-srivo_eth_p4.yaml | grep 'name: '
  name: 0000:0b:00.6
  name: 0000:0b:00.7
  name: 0000:0b:01.0
  name: 0000:0b:01.1
  name: 0000:0b:00.2
  name: 0000:0b:00.3
  name: 0000:0b:00.4
  name: 0000:0b:00.5


root@10-20-1-50:/var/run/cdi# kubectl get sriovnetworknodepolicies.sriovnetwork.openshift.io -n spiderpool cx5-p4 -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetworkNodePolicy","metadata":{"annotations":{},"creationTimestamp":"2024-10-14T08:50:06Z","generation":1,"name":"cx5-p4","namespace":"spiderpool","resourceVersion":"29963475","uid":"a7351b59-2edd-459a-9e11-dac37c46f9c7"},"spec":{"deviceType":"netdevice","isRdma":true,"nicSelector":{"rootDevices":["0000:0b:00.0"],"vendor":"15b3"},"nodeSelector":{"kubernetes.io/os":"linux"},"numVfs":8,"priority":99,"resourceName":"srivo_eth_p4"}}
  creationTimestamp: "2024-11-25T12:27:48Z"
  generation: 1
  name: cx5-p4
  namespace: spiderpool
  resourceVersion: "69518810"
  uid: aece0fd0-8f1f-4516-b7b5-91fa9da39ae1
spec:
  deviceType: netdevice
  isRdma: true
  nicSelector:
    rootDevices:
    - 0000:0b:00.0
    vendor: 15b3
  nodeSelector:
    kubernetes.io/os: linux
  numVfs: 8
  priority: 99
  resourceName: srivo_eth_p4

What did you expect to happen?

I have 8 VFs for resource cx5-p4, you can see the CDI file below(sriov-dp-spidernet.io-net-pci-srivo_eth_p4.yaml )

When one of the resource devices(0000:0b:00.2) is allocated to a pod, the CDI file should only show 7 deviceNodes.

What are the minimal steps needed to reproduce the bug?

Anything else we need to know?

It seems we ignore #576 (comment)

Component Versions

Please fill in the below table with the version numbers of components used.

Component Version
SR-IOV Network Device Plugin
SR-IOV CNI Plugin
Multus
Kubernetes
OS

Config Files

Config file locations may be config dependent.

Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
CNI config (Try '/etc/cni/net.d/')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeconfig file
SR-IOV Network Custom Resource Definition

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions