-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I'm running a Kata containers k8s cluster with HGX H100 servers. I want to run fabricmanager in the Shared NVSwitch Virtualization Mode whereby a service VM runs fabricmanager to set up the NVSwitch partitions. This could be done by making a Kata daemonset that only runs fabricmanager and has the NVSwitch devices passed through. However this requires k8s to know about NVSwitch devices. I would like to ask if there are any plans to support this, and also whether you'd be open to contributions to get this supported.
This has tie-ins with the CoCo work already released in GPU Operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/confidential-containers-deploy.html and another PR just recently opened in GPU Operator that allows further configuration of fabricmanager.