|
| 1 | +# VEP #183: SR-IOV Network DRA Support |
| 2 | + |
| 3 | +## Release Signoff Checklist |
| 4 | + |
| 5 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 6 | + |
| 7 | +- [x] (R) Enhancement issue created, which links to VEP dir in [kubevirt/enhancements] (not the initial VEP PR) |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +This proposal adds support for DRA (Dynamic Resource Allocation) provisioned SR-IOV network devices in KubeVirt. |
| 12 | +It extends the existing KubeVirt networks API with a new `ResourceClaimNetworkSource` type, allowing SR-IOV NICs to be allocated via DRA while maintaining compatibility with the existing Multus-based SR-IOV approach. |
| 13 | + |
| 14 | +This VEP builds upon the core DRA infrastructure defined in VEP #10 ([kubevirt/enhancements/pull/11](https://github.com/kubevirt/enhancements/pull/11)) to add support for network devices, specifically SR-IOV NICs. |
| 15 | + |
| 16 | +## Motivation |
| 17 | + |
| 18 | +DRA adoption for network devices is important for KubeVirt so that network device vendors can expect |
| 19 | +the same level of control when using Virtual Machines as they have with Containers. |
| 20 | +DRA allows network device vendors fine-grained control over device allocation and topology. |
| 21 | + |
| 22 | +## Goals |
| 23 | + |
| 24 | +- Introduce the API changes needed to consume DRA-enabled SR-IOV network devices in KubeVirt |
| 25 | +- Introduce how KubeVirt will consume SR-IOV devices via external DRA drivers |
| 26 | +- Seamlessly support DRA-based SR-IOV use cases available to containers in KubeVirt VMIs |
| 27 | +- Support custom MAC addresses for DRA-based SR-IOV networks |
| 28 | + |
| 29 | +## Non Goals |
| 30 | + |
| 31 | +- Replace existing Multus-based SR-IOV network integration (remains fully supported) |
| 32 | +- Deploy DRA SR-IOV driver (handled by sriov-network-operator) |
| 33 | +- Support coexistence of DRA SR-IOV and device-plugin SR-IOV |
| 34 | +- Live migration of VMs with DRA network devices |
| 35 | + |
| 36 | +## Definition of Users |
| 37 | + |
| 38 | +- **User**: A person who wants to attach SR-IOV network devices to a VM |
| 39 | +- **Admin**: A person who manages infrastructure and configures DRA device classes and drivers |
| 40 | +- **Developer**: A person familiar with CNCF ecosystem who develops automation using these APIs |
| 41 | + |
| 42 | +## User Stories |
| 43 | + |
| 44 | +- As a user, I want to consume SR-IOV network devices via DRA in my VMs |
| 45 | +- As a user, I want to specify custom MAC addresses for DRA-provisioned SR-IOV interfaces |
| 46 | +- As an admin, I want to use DRA drivers to manage SR-IOV device allocation with fine-grained control |
| 47 | +- As a developer, I want extensible APIs to build automation for DRA-based networking |
| 48 | + |
| 49 | +## Use Cases |
| 50 | + |
| 51 | +### Supported Use Cases |
| 52 | + |
| 53 | +1. SR-IOV network devices where the DRA driver publishes required attributes in device metadata files: |
| 54 | + - `resources.kubernetes.io/pciBusID` for SR-IOV VF passthrough |
| 55 | + |
| 56 | +### Future Use Cases |
| 57 | +1. Scalable Functions network devices |
| 58 | +2. Live migration of VMIs using DRA network devices (will have a VEP amendment) |
| 59 | + |
| 60 | +## Repos |
| 61 | + |
| 62 | +kubevirt/kubevirt |
| 63 | + |
| 64 | +## Design |
| 65 | + |
| 66 | +This design introduces a new feature gate: `NetworkDevicesWithDRA`. |
| 67 | +All the API changes will be gated behind this feature gate so as not to break existing functionality. |
| 68 | + |
| 69 | +### API Changes |
| 70 | + |
| 71 | +A new network source type `ResourceClaimNetworkSource` is added to the existing `NetworkSource` type: |
| 72 | + |
| 73 | +```go |
| 74 | +// Represents the source resource that will be connected to the vm. |
| 75 | +// Only one of its members may be specified. |
| 76 | +type NetworkSource struct { |
| 77 | + Pod *PodNetwork `json:"pod,omitempty"` |
| 78 | + Multus *MultusNetwork `json:"multus,omitempty"` |
| 79 | + ResourceClaim *ResourceClaimNetworkSource `json:"resourceClaim,omitempty"` |
| 80 | +} |
| 81 | + |
| 82 | +// ResourceClaimNetworkSource represents a network resource requested |
| 83 | +// via a Kubernetes ResourceClaim. |
| 84 | +type ResourceClaimNetworkSource struct { |
| 85 | + // ClaimName references the name of an entry in the |
| 86 | + // VMI's spec.resourceClaims[] array. |
| 87 | + // +kubebuilder:validation:MinLength=1 |
| 88 | + ClaimName string `json:"claimName"` |
| 89 | + |
| 90 | + // RequestName specifies which request from the |
| 91 | + // ResourceClaim.spec.devices.requests array this network |
| 92 | + // source corresponds to. |
| 93 | + // +kubebuilder:validation:MinLength=1 |
| 94 | + RequestName string `json:"requestName"` |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +The VMI must also include the resource claim in `spec.resourceClaims[]` (consistent with GPU and HostDevice DRA usage). |
| 99 | + |
| 100 | +### Status Reporting |
| 101 | + |
| 102 | +For consistency with GPUs and HostDevices, DRA-provisioned network devices populate the same `vmi.status.deviceStatus.hostDeviceStatuses[]` array. The DRA controller in virt-controller: |
| 103 | + |
| 104 | +1. Identifies networks with `resourceClaim` source type |
| 105 | +2. Extracts device information from the allocated ResourceClaim and ResourceSlice |
| 106 | +3. Populates `hostDeviceStatuses` with network name and allocated device attributes (PCI address) |
| 107 | + |
| 108 | +The status entry name matches the network name from `spec.networks[].name`, allowing virt-launcher to correlate the network configuration with its allocated DRA device. |
| 109 | + |
| 110 | +The detailed mechanism for extracting device information from Pod status, ResourceClaim, and ResourceSlice follows the same approach described in VEP #10. |
| 111 | + |
| 112 | +### SR-IOV Integration |
| 113 | + |
| 114 | +When a network interface has `sriov` binding and references a network with `resourceClaim` source: |
| 115 | + |
| 116 | +1. The network admitter validates that exactly one network source type (pod, multus, or resourceClaim) is specified |
| 117 | +2. Virt-controller adds the resource claim to the virt-launcher pod spec via `WithNetworksDRA()` render option |
| 118 | +3. The DRA controller populates `vmi.status.deviceStatus` with the PCI address from the ResourceSlice |
| 119 | +4. Virt-launcher reads the PCI address from device status and generates the appropriate libvirt hostdev XML (at [`generateConverterContext`](https://github.com/kubevirt/kubevirt/blob/ffa91c8156fecf1d91dd865c6197865a0a3e525b/pkg/virt-launcher/virtwrap/manager.go#L1163), alongside the existing `sriov.CreateHostDevices` call), identical to traditional Multus-based SR-IOV |
| 120 | + |
| 121 | +This approach provides clean separation: DRA handles device provisioning, KubeVirt networks API handles configuration. |
| 122 | + |
| 123 | +**Important:** Traditional Multus-based SR-IOV (using `multus` network source) and DRA-based SR-IOV (using `resourceClaim` network source) are **mutually exclusive per VM**. A single VMI should not mix both approaches. The existing Multus-based SR-IOV API remains fully supported and unchanged. |
| 124 | + |
| 125 | +### Custom MAC Address Support |
| 126 | + |
| 127 | +To support custom MAC addresses for DRA-based SR-IOV networks, KubeVirt will annotate the virt-launcher pod with requested MAC addresses. The MAC address will be taken from the existing `spec.domain.devices.interfaces[].macAddress` field: |
| 128 | + |
| 129 | +``` |
| 130 | +kubevirt.io/dra-networks: '[{"claimName":"sriov","requestName":"vf","mac":"de:ad:00:00:be:ef"}]' |
| 131 | +``` |
| 132 | + |
| 133 | +This preserves the structure of `k8s.v1.cni.cncf.io/networks`, but for claimName/requestName instead of NAD. |
| 134 | + |
| 135 | +The SR-IOV DRA driver reads this annotation and passes the claim/request identifier along with the MAC address to the SR-IOV CNI, ensuring the network interface is configured with the specified MAC address. |
| 136 | + |
| 137 | +**Design Rationale:** The annotation-based approach was chosen because it solves the case where ResourceClaim/ResourceClaimTemplate is created by the admin (not by KubeVirt). Since this approach handles the more complex admin-created claim scenario, it naturally also works for the general case where KubeVirt creates the claims ("auto" mode), providing a unified solution for both scenarios. |
| 138 | + |
| 139 | +### Validation |
| 140 | + |
| 141 | +Webhook validations ensure: |
| 142 | +1. Networks with `resourceClaim` source have corresponding `sriov` binding interfaces |
| 143 | +2. Each network must reference a unique `claimName` + `requestName` combination. No two DRA entities (networks, hostDevices, or GPUs) can share the same tuple, as each interface+network pair must map to exactly one device allocation |
| 144 | +3. No mixing of Multus-based and DRA-based SR-IOV in the same VMI. |
| 145 | + |
| 146 | +### Component Changes |
| 147 | + |
| 148 | +**Virt-Controller:** |
| 149 | +- Renders virt-launcher pod spec with resource claims from `vmi.spec.resourceClaims[]` referenced by `vmi.spec.networks[].resourceClaim` |
| 150 | +- Annotates virt-launcher pod with `kubevirt.io/dra-networks` containing MAC addresses from `spec.domain.devices.interfaces[].macAddress` |
| 151 | + |
| 152 | +**Virt-Launcher:** |
| 153 | +- For SR-IOV networks with DRA, virt-launcher uses `vmi.status.deviceStatus` to generate the domain XML instead of Kubevirt's downwardAPI file as in the case of device-plugins |
| 154 | +- The `CreateDRAHostDevices()` function generates hostdev XML by: |
| 155 | + - Filtering VMI spec interfaces with SRIOV binding that reference networks with resourceClaim source |
| 156 | + - Looking up the corresponding VMI status device status entry by network name |
| 157 | + - Extracting the PCI address from VMI status device status attributes |
| 158 | + - Generating standard libvirt hostdev XML |
| 159 | + |
| 160 | +- **Note:** If the ResourceClaim/ResourceClaimTemplate is allocating more than one device for the request, KubeVirt will consume the first device from the allocated devices |
| 161 | + |
| 162 | +## API Examples |
| 163 | + |
| 164 | +### VMI with DRA SR-IOV Network |
| 165 | + |
| 166 | +```yaml |
| 167 | +--- |
| 168 | +apiVersion: resource.k8s.io/v1 |
| 169 | +kind: DeviceClass |
| 170 | +metadata: |
| 171 | + name: sriov.network.example.com |
| 172 | +spec: |
| 173 | + selectors: |
| 174 | + - cel: |
| 175 | + expression: device.driver == 'sriov.network.example.com' |
| 176 | +--- |
| 177 | +apiVersion: resource.k8s.io/v1 |
| 178 | +kind: ResourceClaimTemplate |
| 179 | +metadata: |
| 180 | + name: sriov-network-claim-template |
| 181 | + namespace: default |
| 182 | +spec: |
| 183 | + spec: |
| 184 | + devices: |
| 185 | + requests: |
| 186 | + - name: sriov-nic-request |
| 187 | + exactly: |
| 188 | + deviceClassName: sriov.network.example.com |
| 189 | +--- |
| 190 | +apiVersion: kubevirt.io/v1 |
| 191 | +kind: VirtualMachineInstance |
| 192 | +metadata: |
| 193 | + name: vmi-sriov-dra |
| 194 | + namespace: default |
| 195 | +spec: |
| 196 | + domain: |
| 197 | + devices: |
| 198 | + interfaces: |
| 199 | + - name: sriov-net |
| 200 | + sriov: {} |
| 201 | + macAddress: "de:ad:00:00:be:ef" |
| 202 | + networks: |
| 203 | + - name: sriov-net |
| 204 | + resourceClaim: |
| 205 | + claimName: sriov-network-claim |
| 206 | + requestName: sriov-nic-request |
| 207 | + resourceClaims: |
| 208 | + - name: sriov-network-claim |
| 209 | + resourceClaimTemplateName: sriov-network-claim-template |
| 210 | +status: |
| 211 | + deviceStatus: |
| 212 | + hostDeviceStatuses: |
| 213 | + - name: sriov-net |
| 214 | + deviceResourceClaimStatus: |
| 215 | + name: 0000-05-00-1 |
| 216 | + resourceClaimName: virt-launcher-vmi-sriov-dra-sriov-network-claim-abc123 |
| 217 | + attributes: |
| 218 | + pciAddress: 0000:05:00.1 |
| 219 | +--- |
| 220 | +apiVersion: v1 |
| 221 | +kind: Pod |
| 222 | +metadata: |
| 223 | + name: virt-launcher-vmi-sriov-dra |
| 224 | + namespace: default |
| 225 | + annotations: |
| 226 | + kubevirt.io/dra-networks: '[{"claimName":"sriov-network-claim","requestName":"sriov-nic-request","mac":"de:ad:00:00:be:ef"}]' |
| 227 | +spec: |
| 228 | + containers: |
| 229 | + - name: compute |
| 230 | + image: virt-launcher |
| 231 | + resources: |
| 232 | + claims: |
| 233 | + - name: sriov-network-claim |
| 234 | + request: sriov-nic-request |
| 235 | + resourceClaims: |
| 236 | + - name: sriov-network-claim |
| 237 | + resourceClaimTemplateName: sriov-network-claim-template |
| 238 | +status: |
| 239 | + resourceClaimStatuses: |
| 240 | + - name: sriov-network-claim |
| 241 | + resourceClaimName: virt-launcher-vmi-sriov-dra-sriov-network-claim-abc123 |
| 242 | +``` |
| 243 | +
|
| 244 | +## Scalability |
| 245 | +
|
| 246 | +The DRA controller in virt-controller uses existing shared informers (no additional watch calls) and filters events to relevant status sections. See [VEP #10](../../sig-compute/10-dra-devices/vep.md#scalability) for detailed scalability analysis. |
| 247 | +
|
| 248 | +## Update/Rollback Compatibility |
| 249 | +
|
| 250 | +- Changes are upgrade compatible |
| 251 | +- Rollback works as long as feature gate is disabled |
| 252 | +- If the feature is enabled, VMIs using DRA network devices must be deleted and feature gate disabled before attempting rollback |
| 253 | +
|
| 254 | +## Functional Testing Approach |
| 255 | +
|
| 256 | +- Unit tests with optimum coverage for new code |
| 257 | +- New e2e test lane with all current SR-IOV tests using the new API |
| 258 | +(excluding migration tests, which will be added when migration is supported) |
| 259 | +
|
| 260 | +## Implementation History |
| 261 | +
|
| 262 | +- 2026-01-20: Initial design/VEP proposal for SR-IOV Network DRA support |
| 263 | +
|
| 264 | +## Graduation Requirements |
| 265 | +
|
| 266 | +### Alpha |
| 267 | +
|
| 268 | +- Code changes behind `NetworkDevicesWithDRA` feature gate |
| 269 | +- Unit tests |
| 270 | +- E2E tests with SR-IOV DRA driver (excluding migration) |
| 271 | + |
| 272 | +### Beta |
| 273 | + |
| 274 | +- Evaluate user and driver author experience |
| 275 | +- Consider additional use cases if any |
| 276 | +- Work with Kubernetes community on standardizing device information injection |
| 277 | +- Live migration support for DRA network devices |
| 278 | + - Live migration will use CDI/NRI to inject device information as files into each pod (mappings of request/claim to PCI addresses) |
| 279 | + - Each virt-launcher reads its pod-specific device file, avoiding conflicts in VMI status |
| 280 | + - Might be initially implemented by SR-IOV DRA driver; future Kubernetes support may generalize this (see [kubernetes/enhancements#5606](https://github.com/kubernetes/enhancements/pull/5606)) |
| 281 | + - Details: https://github.com/k8snetworkplumbingwg/dra-driver-sriov/pull/62 |
| 282 | + |
| 283 | +### GA |
| 284 | + |
| 285 | +- Upgrade/downgrade testing |
| 286 | + |
| 287 | +## References |
| 288 | + |
| 289 | +- DRA: https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/ |
| 290 | +- SR-IOV DRA driver: https://github.com/k8snetworkplumbingwg/dra-driver-sriov |
| 291 | +- VEP #10 (DRA devices): /veps/sig-compute/10-dra-devices/vep.md |
| 292 | +- Kubernetes DRA device information injection: https://github.com/kubernetes/enhancements/pull/5606 |
0 commit comments