Skip to content

Conversation

@zvonkok
Copy link
Contributor

@zvonkok zvonkok commented Jan 15, 2026

In when using DGX/HGX systems with virtualization
we need to bind the NvSwitches to vfio-pci as well.

This enables us to use two virtualization modes

  1. Full passthrough
  2. ServiceVM

We're explicitly not considering mode 3 with vGPUs.

Copilot AI review requested due to automatic review settings January 15, 2026 16:41
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for binding NVIDIA NVSwitches to the vfio-pci driver for DGX/HGX systems requiring virtualization. This enables full passthrough and ServiceVM virtualization modes.

Changes:

  • Added NVSwitch device discovery and binding in the bindAll() method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cdesiniotis
Copy link
Collaborator

/ok to test df87aec

@cdesiniotis
Copy link
Collaborator

/ok to test e16d9b6

@zvonkok
Copy link
Contributor Author

zvonkok commented Jan 16, 2026

/ok to test eef1c70

@zvonkok
Copy link
Contributor Author

zvonkok commented Jan 19, 2026

Tested on a Viking, all four NvSwitches and all eight GPUs are bound to vfio-pci

❯ k logs -n gpu-operator nvidia-vfio-manager-cxqhs 
Defaulted container "nvidia-vfio-manager" out of: nvidia-vfio-manager, k8s-driver-manager (init)
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:1b:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:1b:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:43:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:43:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:52:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:52:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:61:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:61:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:9d:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:9d:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:c3:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:c3:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:d1:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:d1:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:df:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:df:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:07:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:07:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:08:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:08:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:09:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:09:00.0 to driver: vfio-pci
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:0a:00.0
time=2026-01-19T18:07:49Z level=info msg=Binding device 0000:0a:00.0 to driver: vfio-pci

@zvonkok
Copy link
Contributor Author

zvonkok commented Jan 19, 2026

Add boolean to enable/disable binding

On a NVLINK4 system if we want to support virt-model 1
we need to bind the switches to vfio-pci, supporting
virt-model 3 with fabric-manager running on the host
we do not need to bind the switches to vfio-pci.

  vfioManager:
    env:
    - name: BIND_NVSWITCHES
      value: "false"

8 GPUs bound to vfio-pci

❯ k logs -n gpu-operator nvidia-vfio-manager-26s9k 
Defaulted container "nvidia-vfio-manager" out of: nvidia-vfio-manager, k8s-driver-manager (init)
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:1b:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:1b:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:43:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:43:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:52:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:52:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:61:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:61:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:9d:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:9d:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:c3:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:c3:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:d1:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:d1:00.0 to driver: vfio-pci
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:df:00.0
time=2026-01-19T19:25:59Z level=info msg=Binding device 0000:df:00.0 to driver: vfio-pci
  vfioManager:
    env:
    - name: BIND_NVSWITCHES
      value: "true"

4 NvSwitches and 8 GPUs bound to vfio-pci

❯ k logs -n gpu-operator nvidia-vfio-manager-cnzgz 
Defaulted container "nvidia-vfio-manager" out of: nvidia-vfio-manager, k8s-driver-manager (init)
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:1b:00.0
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:1b:00.0 to driver: vfio-pci
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:43:00.0
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:43:00.0 to driver: vfio-pci
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:52:00.0
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:52:00.0 to driver: vfio-pci
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:61:00.0
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:61:00.0 to driver: vfio-pci
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:9d:00.0
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:9d:00.0 to driver: vfio-pci
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:c3:00.0
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:c3:00.0 to driver: vfio-pci
time=2026-01-19T19:29:55Z level=info msg=Binding device 0000:d1:00.0
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:d1:00.0 to driver: vfio-pci
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:df:00.0
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:df:00.0 to driver: vfio-pci
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:07:00.0
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:07:00.0 to driver: vfio-pci
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:08:00.0
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:08:00.0 to driver: vfio-pci
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:09:00.0
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:09:00.0 to driver: vfio-pci
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:0a:00.0
time=2026-01-19T19:29:56Z level=info msg=Binding device 0000:0a:00.0 to driver: vfio-pci

Copy link
Contributor

@jojimt jojimt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

In when using DGX/HGX systems with virtualization
we need to bind the NvSwitchess to vfio-pci as well.

This enables us to use two virtualization modes

1. Full passthrough
2. ServiceVM

We're explicitly not considering mode 3 with vGPUs.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Despite its name, GetGPUByPciBusID returns any NVIDIA PCI device
(GPU, NVSwitch, etc.) at the specified address, not just GPUs.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
On a NVLINK4 system if we want to support virt-model 1
we need to bind the switches to vfio-pci, supporting
virt-model 3 with fabric-manager running on the host
we do not need to bind the switches to vfio-pci.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
@zvonkok zvonkok merged commit 32de8a3 into NVIDIA:main Jan 23, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants