Skip to content

VEP-183: NetworkDevicesWithDRA API design#185

Open
oshoval wants to merge 1 commit intokubevirt:mainfrom
oshoval:dranet
Open

VEP-183: NetworkDevicesWithDRA API design#185
oshoval wants to merge 1 commit intokubevirt:mainfrom
oshoval:dranet

Conversation

@oshoval
Copy link
Copy Markdown
Contributor

@oshoval oshoval commented Jan 20, 2026

VEP Metadata

Tracking issue: #183
Upstream issue kubevirt/kubevirt#15995

SIG label: /sig network

What this PR does

Document support for DRA-provisioned network devices, specifically SR-IOV NICs.

Key additions:

  • ResourceClaimNetworkSource API for specifying DRA networks in spec.networks
  • SR-IOV integration details: device allocation via DRA, configuration via
    existing KubeVirt networks API.
  • Custom MAC address support through kubevirt.io/dra-networks.
  • NetworkDevicesWithDRA feature gate (Alpha).
  • Example VMI YAML with DRA SR-IOV network configuration.

Network DRA maintains mutual exclusivity with traditional Multus-based SR-IOV
per VM. The existing Multus SR-IOV API remains fully supported and unchanged.

Thanks @SchSeba for the co-op, and to all who contributed

Special notes for your reviewer

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jan 20, 2026
@kubevirt-bot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@oshoval oshoval mentioned this pull request Jan 20, 2026
4 tasks
@oshoval oshoval force-pushed the dranet branch 2 times, most recently from 2bbda95 to 9dd312c Compare January 20, 2026 13:59
@oshoval oshoval changed the title (dont review yet please) vep-181: NetworkDevicesWithDRA API design vep-181: NetworkDevicesWithDRA API design Jan 20, 2026
@alaypatel07
Copy link
Copy Markdown
Contributor

/cc @alaypatel07

@oshoval oshoval force-pushed the dranet branch 3 times, most recently from 3aa5a35 to ac099a5 Compare January 22, 2026 10:41
@oshoval oshoval marked this pull request as ready for review January 22, 2026 10:46
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2026
@oshoval
Copy link
Copy Markdown
Contributor Author

oshoval commented Jan 22, 2026

cc @SchSeba @aojea @alaypatel07 @EdDev @orelmisan @LionelJouin
hope i didn't miss anyone
might still change stuff, but this should be the essence hopefully

Thanks

To support custom MAC addresses for DRA-based SR-IOV networks, KubeVirt annotates the virt-launcher pod with requested MAC addresses. The MAC address is taken from the existing `spec.domain.devices.interfaces[].macAddress` field:

```
kubevirt.io/dra-networks: '[{"claimName":"sriov","requestName":"vf","mac":"de:ad:00:00:be:ef"}]'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SchSeba @EdDev
ack this one please, so we can continue SR-IOV driver implementation of it

Copy link
Copy Markdown

@aojea aojea Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative is to use the ResourceClaim and ResourceClaimTemplate opaque data to pass this information, so you don't need to embed it into an annotation, this information is cascaded to the driver via the kubelet NodePrepareResources hook

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: sriov-network-claim-template
  namespace: default
spec:
  spec:
    devices:
      requests:
      - name: sriov-nic-request
        exactly:
          deviceClassName: sriov.network.example.com
    config:
    - opaque:
        driver: sriov.network.example.com
        parameters:
          interface:
            name: "enp0s1"
            mtu: 4321
            hardwareAddr: "00:11:22:33:44:55"

this way you can also create an schema and add validation, see how I do in dranet

https://github.com/kubernetes-sigs/dranet/blob/main/pkg/apis/types.go

https://github.com/kubernetes-sigs/dranet/blob/bfe5826c1ddc63ee3fe7e561f71ba232072a9341/pkg/driver/dra_hooks.go#L175-L206

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we discussed this approach,
It is a resource claim template, which means k8s is the one that creates ResourceClaims based on that, and it is a singleton, where each virt-launcher pod needs a different MAC, hence it is not scalable.
The MAC approach allows minimum maintenance by Kubevirt, and per pod custom MAC.

It solves the specific case where ResourceClaim/ResourceClaimsTemplate aren't created by Kubevirt, so it is good to use MAC annotation approach also for the generic case where Kubevirt itself creates the ResourceClaim ("auto" mode), so there will be one solution for all the cases.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here we can't have this one if we want to re-use the resourceClaimTemplate for multiple pod/vms.

then we can also continue to use the kubemacpool system to assign the mac address. (we may just need to ajust the annotation)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to have a mutating webhook on the resourceClaim that will inject this opaque mac address to have unique mac addresses, but again I don't know how much work that is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be e2e test that injects a MAC, and depends on the current value being handled
this is what we do already for the current multus + DP approach, same tests will be duplicated to use DRA as well

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multus is fine I think as it's a full implementation. DRA is a kubernetes feature that has multiple implementations (DRANET, SR-IOV DRA Driver, CNI DRA Driver...), so having conformance tests could help to determine which DRA Driver is Kubevirt networking compliant.

Copy link
Copy Markdown
Contributor

@alaypatel07 alaypatel07 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until now, we thought that any driver which has pciAddress attribute or mdevUUID attribute can be compatible with KubeVirt. But this usecase shows that advance features in kubevirt might require more from driver than just providing those attributes, in which case a conformance tests should will be helpful. It could potentially be tracked as part of graduating VEP-10 to GA.

Multus is fine I think as it's a full implementation. DRA is a kubernetes feature that has multiple implementations (DRANET, SR-IOV DRA Driver, CNI DRA Driver...), so having conformance tests could help to determine which DRA Driver is Kubevirt networking compliant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ResourceClaimTemplate object could be embedded in the pod spec, would this help avoid having this custom annotation?

I remember very early on the plan was to embed the template in pod spec, but because DRA was not stable and pod spec is V1 GA object, it was decided to move the template into its own API.

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you mean that K8s will create a RessourceClaim if ResourceClaimTemplate is embedded in the pod right ?
This is a precondition for this idea.

But i think that we need to consider all the cases

  1. Exisitng ResourceClaim
  2. Existing ResourceClaimTemplate
  3. Kubevirt "auto" mode where Kubevirt is the one that creates the ResourceClaim

And to also consider that lets say ResourceClaimTemplate exists, we copy it to the pod, add there the MAC,
but now upon migration, we need to copy the pod's ResourceClaimTemplate because the original might be changed and we can't depend on it.
While this is a nice idea, given the above, we should start simple imho, and later consider to adjusts according needs.
Moreover i think that it is embedding a whole CRD instead an annotation, and have cons as above.
We can make the annotation standard and it is pretty straight forward and simpler.

Note, didn't think deeper on the other cases, took one edge example to show why it makes things harder.

Another cons, is once we don't have a MAC, we don't need it, but we can't have 2 mechanisms, so it bloats the non MAC path.

@oshoval oshoval changed the title vep-181: NetworkDevicesWithDRA API design vep-183: NetworkDevicesWithDRA API design Jan 22, 2026
@oshoval oshoval changed the title vep-183: NetworkDevicesWithDRA API design VEP-183: NetworkDevicesWithDRA API design Jan 22, 2026
Copy link
Copy Markdown
Member

@nirdothan nirdothan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @oshoval. Looks good. The manifest example is pure gold, and lays out the whole design in a clear manner.
I still need to do my homework before LGTM.
Please see my comments below.


## Goals

- Align on the API changes needed to consume DRA-enabled SR-IOV network devices in KubeVirt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider a different verb, here perhaps "Introduce"?


- Align on the API changes needed to consume DRA-enabled SR-IOV network devices in KubeVirt
- Align on how KubeVirt will consume SR-IOV devices via external DRA drivers
- Enable DRA SR-IOV use cases available to containers to work seamlessly with KubeVirt VMIs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Enable DRA SR-IOV use cases available to containers to work seamlessly with KubeVirt VMIs
- Seamlessly support container based SR-IOV use cases in KubeVirt VMIs

## Non Goals

- Replace existing Multus-based SR-IOV network integration (remains fully supported)
- Deploy DRA SR-IOV driver (handled by sriov-network-operator)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider omitting (handled by sriov-network-operator) as it is not part of KubeVirt, nor is it a prerequisite . You can add a note that recommends it somewhere further down.

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a non goal that emphasize the fact deployment is not handled,
because deployment side by side with DP is tricky, yet to determined, and will be solved by the sriov-operator

#158 (comment)
#158 (comment)

- Deploy DRA SR-IOV driver (handled by sriov-network-operator)
- Support coexistence of DRA SR-IOV and device-plugin SR-IOV
- Live migration of VMs with DRA network devices
- Align on what drivers KubeVirt will support in tree
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one isn't clear to me.

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can drop it as it is mentioned in different ways already,
it is related to the fact we don't support yet side by side DP and DRA drivers in the same time,
so we will just deploy one at a time, until sriov-operator solves it, and then we will able to deploy them together,

the reason is that the driver need to consider each other when allocating the same devices,
and this is what not solved yet

EDIT - dropped

}
```

The VMI must also include the resource claim in `spec.resourceClaims[]` (consistent with GPU and HostDevice DRA usage).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider referencing VEP 10 for this field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already say in the beginning

This VEP builds upon the core DRA infrastructure defined in VEP #10 (https://github.com/kubevirt/enhancements/pull/11) to add support for network devices, specifically SR-IOV NICs.

I prefer to DRY (don't repeat yourself) rule


### Custom MAC Address Support

To support custom MAC addresses for DRA-based SR-IOV networks, KubeVirt annotates the virt-launcher pod with requested MAC addresses. The MAC address is taken from the existing `spec.domain.devices.interfaces[].macAddress` field:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The present tense here is confusing.
  2. Will the annotation also be populated when custom mac address isn't specified?

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. fixed
  2. no, there is no need to

**Virt-Launcher:**
- For SR-IOV networks with DRA, virt-launcher uses `vmi.status.deviceStatus` to generate the domain XML instead of Kubevirt's downwardAPI file as in the case of device-plugins
- The `CreateDRAHostDevices()` function generates hostdev XML by:
- Filtering interfaces with SRIOV binding that reference networks with resourceClaim source
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in VMI spec

- For SR-IOV networks with DRA, virt-launcher uses `vmi.status.deviceStatus` to generate the domain XML instead of Kubevirt's downwardAPI file as in the case of device-plugins
- The `CreateDRAHostDevices()` function generates hostdev XML by:
- Filtering interfaces with SRIOV binding that reference networks with resourceClaim source
- Looking up the corresponding device status entry by network name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In VMI status

- Consider additional use cases if any
- Work with Kubernetes community on standardizing device information injection
- Live migration support for DRA network devices
- Live migration will use CDI/NRI to inject device information as files into each pod (mappings of request/claim to PCI addresses)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me - but maybe It's on me to learn.

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not part of this VEP, just meant to give highlight

in high level, in the long term k8s will create a downward API, each pod will have file with the device info,
CDI/NRI is implementation details
kubernetes/enhancements#5606

until it happens we might create this mechanism by SRIOV driver
(poc that we will continue according needs)
k8snetworkplumbingwg/dra-driver-sriov#59 (POC, closed for now, but it shows how it works, CDI prepare the pod manifest, NRI populates data and then container creates the file)
k8snetworkplumbingwg/dra-driver-sriov#62 (WIP)

nice diagram wrt to this
kubernetes/enhancements#5606 (comment)

also #155 (VEP 10 amendment)
talks about using downward API already

- Looking up the corresponding device status entry by network name
- Extracting the PCI address from device status attributes
- Generating standard libvirt hostdev XML
- **Note:** If the ResourceClaim/ResourceClaimTemplate has `count != 1`, KubeVirt will consume the first device from the allocated devices
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a detail, to allocate multiple devices via a resourceclaim, count could be used or the allocationMode set to all. Also, I am not sure it is guaranteed the list of allocated devices will be returned in a deterministic order.

Suggested change
- **Note:** If the ResourceClaim/ResourceClaimTemplate has `count != 1`, KubeVirt will consume the first device from the allocated devices
- **Note:** If the ResourceClaim/ResourceClaimTemplate is allocating more than one device for the request, KubeVirt will consume the first device from the allocated devices

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will change,
please note that this logic (take first in list) won't be added as part of this VEP,
since it depends on the downward API which isn't part of this VEP (new VEP will be created afterwards)

EDIT - done


// ResourceClaimNetworkSource represents a network resource requested
// via a Kubernetes ResourceClaim.
type ResourceClaimNetworkSource struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because for Network DRA, requestName must exists, as was discussed in the meeting,
Because we support only one per network,interface tuple atm.
And having requestName always makes sure that it will support both ResourceClaim[Template] with one device type, or more.

however for GPU and HostDevices it isn't.
Yet there is a bug there
kubevirt/kubevirt#16319 (comment)
Once i reach it i will open an issue (with better explanation) about it so we can discuss and fix it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created issue for this
kubevirt/kubevirt#16771

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on the issue.

Yet there is a bug there

I am not sure if this is a bug, there is an intentional API decision in GPU struct that each device is a single element in the list. Having empty requestName gives flexibility but I cant think of any usecase for this flexibility.

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If requestName isn't supplied, it fail to find it using current logic, without failing earlier with deterministic assertion, hence at least that should be fixed imo.
Will answer verbose on kubevirt/kubevirt#16771 with the options at least as i see it, thanks

// ResourceClaimNetworkSource represents a network resource requested
// via a Kubernetes ResourceClaim.
type ResourceClaimNetworkSource struct {
// ClaimName references the name of a ResourceClaim in the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ClaimName directly references the name of the ResourceClaim in the namespace, what is the point of having the resource claim mentioned in spec.resourceClaims[] ?

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because spec.resourceClaims[] is the one that already exists for all DRA types,
and in this struct ResourceClaimNetworkSource we have claim + request,
so there can be multiple ResourceClaimNetworkSource with different request, but that point to the same spec.resourceClaims[] entry
This was the designed selected based on @aojea idea, after co-op with Sebastian and Edy, and the people in the DRA meeting, given the current DRA API,

EDIT - maybe the comment is wrong and that's what you mean, it points to spec.resourceClaims entry
not directly to a CR, fixed comment, thanks

Btw this also has a bug, that i will open an issue about
@SchSeba found that we generally speaking we don't render the pod correct in some more
advanced cases
here is more info, it is out of the scope of this VEP
kubevirt/kubevirt@164297c
I will open an issue about it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT - maybe the comment is wrong and that's what you mean, it points to spec.resourceClaims entry
not directly to a CR, fixed comment, thanks

+1 this is what I meant.

Yes please open a buy, I think we can fix it as part of VEP-10 beta work.

@SchSeba found that we generally speaking we don't render the pod correct in some more
advanced cases
here is more info, it is out of the scope of this VEP
kubevirt/kubevirt@164297c
I will open an issue about it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment

Created here, self assigned because already have a WIP direction
kubevirt/kubevirt#16769

@bgchun-fs
Copy link
Copy Markdown

subscribe

@oshoval
Copy link
Copy Markdown
Contributor Author

oshoval commented Feb 8, 2026

Thanks all for the review,
Answered Nir's review (and some other comments), will update once i finish all

@oshoval
Copy link
Copy Markdown
Contributor Author

oshoval commented Feb 8, 2026

Addressed / answered all comments, thank you

Document support for DRA-provisioned network devices, specifically SR-IOV NICs.

Key additions:
- ResourceClaimNetworkSource API for specifying DRA networks in spec.networks
- SR-IOV integration details: device allocation via DRA, configuration via
  existing KubeVirt networks API.
- Custom MAC address support through kubevirt.io/dra-networks.
- NetworkDevicesWithDRA feature gate (Alpha).
- Example VMI YAML with DRA SR-IOV network configuration.

Network DRA maintains mutual exclusivity with traditional Multus-based SR-IOV
per VM. The existing Multus SR-IOV API remains fully supported and unchanged.

Assisted-by: claude-4.5-sonnet
Signed-off-by: Or Shoval <oshoval@redhat.com>
@oshoval
Copy link
Copy Markdown
Contributor Author

oshoval commented Feb 8, 2026

fixed comment according #185 (comment) thread


1. The network admitter validates that exactly one network source type (pod, multus, or resourceClaim) is specified
2. Virt-controller adds the resource claim to the virt-launcher pod spec via `WithNetworksDRA()` render option
3. The DRA controller populates `vmi.status.deviceStatus` with the PCI address from the ResourceSlice
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, you must query the resourceClaim in order to find the resourceSlice element/s name/s that was/were allocated to this pod.
Field ref resourceclaim.status.allocation.devices.results.device

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DRA controller (#10) populates the data from the ResourceClaim to vmi.status.deviceStatus already
so we just need to read it,
we show flow here, not addition by this VEP

devices:
requests:
- name: sriov-nic-request
exactly:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this select exactly 1 resource?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

spec:
domain:
devices:
interfaces:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to add one more sriov interface in order to demonstrate what gets multiplied(ifaces, networks, resourceClaims?) and what remains sigular (deviceClass, resourceClaimTemplate)

Copy link
Copy Markdown
Contributor Author

@oshoval oshoval Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well we dont support it correct yet because of kubevirt/kubevirt#16769
so i prefer not, once we do i will amend it

as far as i know we need this,
once we support downward API, maybe we will able to simplify this
(and to relax some of the webhook limitations)
right @SchSeba ?

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  namespace: vf-test2
  name: single-vf
spec:
  spec:
    devices:
      requests:
      - name: vf1
        exactly:
          deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
      - name: vf2
        exactly:
          deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
      config:
      - requests: ["vf1","vf2"]
        opaque:
          driver: sriovnetwork.k8snetworkplumbingwg.io
          parameters:
            apiVersion: sriovnetwork.k8snetworkplumbingwg.io/v1alpha1
            kind: VfConfig
            netAttachDefName: vf-test
            driver: vfio-pci
            addVhostMount: true
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: testvmi
  namespace: vf-test2
spec:
  runStrategy: Always
  template:
    metadata:
      labels:
        kubevirt.io/domain: testvmi
    spec:
      architecture: amd64
      domain:
        cpu:
          cores: 1
          maxSockets: 4
          model: host-model
          sockets: 1
          threads: 1
        devices:
          autoattachGraphicsDevice: false
          disks:
          - disk:
              bus: virtio
            name: disk0
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            name: default
          - sriov: {}
            name: red
            macAddress: "de:ad:00:00:be:ef"
          - sriov: {}
            name: blue
          rng: {}
        features:
          acpi:
            enabled: true
        firmware:
          uuid: d7873366-c2b0-4a56-bfc2-1e1bee0a88db
        machine:
          type: q35
        memory:
          guest: 1Gi
        resources:
          requests:
            memory: 1Gi
      evictionStrategy: None
      networks:
      - name: default
        pod: {}
      - name: red
        resourceClaim:
          claimName: sriov
          requestName: vf1
      - name: blue
        resourceClaim:
          claimName: sriov
          requestName: vf2
      terminationGracePeriodSeconds: 0
      resourceClaims:
      - name: sriov
        resourceClaimTemplateName: single-vf
      volumes:
      - containerDisk:
          image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel
          imagePullPolicy: IfNotPresent
        name: disk0
      - cloudInitNoCloud:
          networkData: |
            ethernets:
              eth0:
                addresses:
                - fd10:0:2::2/120
                dhcp4: true
                gateway6: fd10:0:2::1
                match: {}
                nameservers:
                  addresses:
                  - 10.96.0.10
                  search:
                  - default.svc.cluster.local
                  - svc.cluster.local
                  - cluster.local
            version: 2
        name: cloudinitdisk

@bgchun-fs
Copy link
Copy Markdown

cc @aojea @alaypatel07 @oshoval
Hey forks! I'm Byonggon Chun from Fluidstack.
(Hi @aojea we had intersection @ last kubecon for DRA good to interact here again)

I've taken the initiative to prototype VEP-183 and successfully tested the DRA SR-IOV integration with real hardware. I skipped few point such as admission webhook or assigning custom mac addresses, since the purpose was fast prototyping and figuring out blockers. you can take a look implementation @ kubevirt/kubevirt#16799 .

I'm happy to help you guys to finish this proposal and implement it. Since we have few ppl waiting this feature. Let me know if you guys need any help from my side.

Test Environment

  • Hardware: Mellanox NIC (SR-IOV capable)
  • DRA Driver: dra-sriov-driver (k8snetworkplumbingwg.io)
  • Kubernetes: v1.34.0
  • Approach: Used ResourceClaimTemplate for VF allocation

Following is the resourceClaimTemplate I used

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata: #trimmed
spec:
  metadata: {}
  spec:
    devices:
      config:
      - opaque:
          driver: sriovnetwork.k8snetworkplumbingwg.io
          parameters:
            apiVersion: sriovnetwork.k8snetworkplumbingwg.io/v1alpha1
            ifName: net1
            kind: VfConfig
            netAttachDefName: sriov-net
            netAttachDefNamespace: default
        requests:
        - vf
      requests:
      - exactly:
          allocationMode: ExactCount
          count: 1
          deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
          selectors:
          - cel:
              expression: device.attributes["sriovnetwork.k8snetworkplumbingwg.io"].resourceName
                == "mellanox-ens1f0"
        name: vf

Following is the VMI spec I used for testing

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
#trimmed
spec:
  architecture: amd64
  domain:
    cpu:
      cores: 1
      maxSockets: 4
      model: host-model
      sockets: 1
      threads: 1
    devices:
      disks:
      - disk:
          bus: virtio
        name: disk
      interfaces:
      - masquerade: {}
        name: default
    features:
      acpi:
        enabled: true
    firmware:
      uuid: 27a78077-ef0f-48e3-afa2-cc482cbe4eb6
    machine:
      type: q35
    memory:
      guest: 1Gi
      maxGuest: 4Gi
    resources:
      requests:
        cpu: "1"
        memory: 1Gi
  evictionStrategy: None
  networks:
  - name: default
    pod: {}
  nodeSelector:
    kubernetes.io/hostname: wdl-200-50-r106-cpu-02
  resourceClaims:
  - name: sriov-vf
    resourceClaimTemplateName: vm-sriov-vf
  volumes:
  - containerDisk:
      image: quay.io/kubevirt/cirros-container-disk-demo
      imagePullPolicy: Always
    name: disk

Following is pod created

apiVersion: v1
kind: Pod
spec:
...
  resourceClaims:
  - name: sriov-vf
    resourceClaimTemplateName: vm-sriov-vf
...
status:
...
  resourceClaimStatuses:
  - name: sriov-vf
    resourceClaimName: virt-launcher-working-test-vmi-f2mms-sriov-vf-72q9h
  startTime: "2026-02-12T09:56:04Z"
...

Following is resourceClaim

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  annotations:
    resource.kubernetes.io/pod-claim-name: sriov-vf
  creationTimestamp: "2026-02-12T09:56:04Z"
  finalizers:
  - resource.kubernetes.io/delete-protection
  generateName: virt-launcher-working-test-vmi-f2mms-sriov-vf-
  name: virt-launcher-working-test-vmi-f2mms-sriov-vf-72q9h
  namespace: default
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Pod
    name: virt-launcher-working-test-vmi-f2mms
    uid: 8e8d3b03-a96b-4056-ae4c-fd9a5ecb5dfe
  resourceVersion: "5631075"
  uid: d0f55c59-917d-4822-a561-a848fd181dc9
spec:
  devices:
    config:
    - opaque:
        driver: sriovnetwork.k8snetworkplumbingwg.io
        parameters:
          apiVersion: sriovnetwork.k8snetworkplumbingwg.io/v1alpha1
          ifName: net1
          kind: VfConfig
          netAttachDefName: sriov-net
          netAttachDefNamespace: default
      requests:
      - vf
    requests:
    - exactly:
        allocationMode: ExactCount
        count: 1
        deviceClassName: sriovnetwork.k8snetworkplumbingwg.io
        selectors:
        - cel:
            expression: device.attributes["sriovnetwork.k8snetworkplumbingwg.io"].resourceName
              == "mellanox-ens1f0"
      name: vf
status:
  allocation:
    devices:
      config:
      - opaque:
          driver: sriovnetwork.k8snetworkplumbingwg.io
          parameters:
            apiVersion: sriovnetwork.k8snetworkplumbingwg.io/v1alpha1
            ifName: net1
            kind: VfConfig
            netAttachDefName: sriov-net
            netAttachDefNamespace: default
        requests:
        - vf
        source: FromClaim
      results:
      - device: 0000-5c-01-6
        driver: sriovnetwork.k8snetworkplumbingwg.io
        pool: wdl-200-50-r106-cpu-02
        request: vf
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - wdl-200-50-r106-cpu-02
  devices:
  - conditions: null
    data:
      cniConfig:
        cniVersion: 1.0.0
        deviceID: 0000:5c:01.6
        ipam:
          ranges:
          - - subnet: {redacted}/24
          type: host-local
        logLevel: info
        name: sriov-net
        spoofchk: "on"
        trust: "on"
        type: sriov
        vlan: 10
        vlanQoS: 0
      cniResult:
        cniVersion: 1.1.0
        interfaces:
        - mac: {redacted}
          mtu: 9214
          name: net1
          sandbox: /var/run/netns/cni-4a902ca3-aad2-543b-f63d-f9510a4eb0de
        ips:
        - address: {redacted}/24
          gateway: {redacted}
          interface: 0
      vfConfig:
        apiVersion: sriovnetwork.k8snetworkplumbingwg.io/v1alpha1
        ifName: net1
        kind: VfConfig
        netAttachDefName: sriov-net
    device: 0000-5c-01-6
    driver: sriovnetwork.k8snetworkplumbingwg.io
    networkData:
      hardwareAddress: {redacted}
      interfaceName: net1
      ips:
      - {redacted}/24
    pool: wdl-200-50-r106-cpu-02
  reservedFor:
  - name: virt-launcher-working-test-vmi-f2mms
    resource: pods
    uid: 8e8d3b03-a96b-4056-ae4c-fd9a5ecb5dfe

@oshoval
Copy link
Copy Markdown
Contributor Author

oshoval commented Feb 13, 2026

Hi @bgchun-fs
thank you, we already implemented it as well already,
kubevirt/kubevirt#16381
and one that i started to polish more for this VEP scope
kubevirt/kubevirt@main...oshoval:vep183
so i prefer to continue the effort and the research we did.
i will look on yours if i can learn from it and if parts can be split for parallel implementation,
for now, please allow us to continue the original plan.
First we should have this VEP merged, and work according priorities.

Btw in additional

@oshoval
Copy link
Copy Markdown
Contributor Author

oshoval commented Feb 26, 2026

Please see
kubevirt/kubevirt#16556 (comment)


Webhook validations ensure:
1. Networks with `resourceClaim` source have corresponding `sriov` binding interfaces
2. Each network must reference a unique `claimName` + `requestName` combination. No two DRA entities (networks, hostDevices, or GPUs) can share the same tuple, as each interface+network pair must map to exactly one device allocation
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now will enforce it just for network versus network, not cross GPU / HostDevice.
If needed also cross better in a separated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants