Skip to content

Bug with network operator 25.10.0 deploy cant read one of the VFs driver after config daemon resetting device plugin #1046

@hmike96

Description

@hmike96

We are running into a bug with the network-operator 25.10.0 deployment. We believe the source is a race condition between the config daemon and the device plugin. We see the config daemon restarting the device plugin pod but the device plugin comes up with one error with GetDevices. E0218 18:39:40.073892       1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:13:00.1 readlink /sys/bus/pci/devices/0000:13:00.1/driver: no such file or directory". This causes the node to drop allocatable nics by 1 (its always 1 never more than 1) but once we manually restart the device plugin it restarts itself. Weird thing is in the logs of the config daemon we dont see that specific VF's driver being bind or unbind. Our configuration disables mellanox plugin as well as draining and we only have the generic plugin running.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions