Skip to content

Support for NVSwitch in Shared NVSwitch Virtualization Model #24

@LandonTClipp

Description

@LandonTClipp

I'm running a Kata containers k8s cluster with HGX H100 servers. I want to run fabricmanager in the Shared NVSwitch Virtualization Mode whereby a service VM runs fabricmanager to set up the NVSwitch partitions. This could be done by making a Kata daemonset that only runs fabricmanager and has the NVSwitch devices passed through. However this requires k8s to know about NVSwitch devices. I would like to ask if there are any plans to support this, and also whether you'd be open to contributions to get this supported.

This has tie-ins with the CoCo work already released in GPU Operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/confidential-containers-deploy.html and another PR just recently opened in GPU Operator that allows further configuration of fabricmanager.

CC @zvonkok @fidencio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions