You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-78Lines changed: 8 additions & 78 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,4 @@
1
-
# NVIDIA K8s Device Plugin to assign GPUs and vGPUs to KubeVirt VMs
2
-
3
-
> Starting from v1.1.0, we will only be supporting KubeVirt v0.36.0 or newer. Please use v1.0.1 for compatibility with older KubeVirt versions.
1
+
# NVIDIA K8s Device Plugin to assign Passthrough GPUs to Kata VMs for Confidential Containers
4
2
5
3
## Table of Contents
6
4
-[About](#about)
@@ -10,26 +8,21 @@
10
8
-[Docs](#docs)
11
9
12
10
## About
13
-
This is a kubernetes device plugin that can discover and expose GPUs and vGPUs on a kubernetes node. This device plugin will enable to launch GPU attached [KubeVirt](https://github.com/kubevirt/kubevirt/blob/master/README.md) VMs in your kubernetes cluster. Its specifically developed to serve KubeVirt workloads in a Kubernetes cluster.
11
+
This is a kubernetes device plugin that can discover and expose GPUs for passthrough on a kubernetes node. This device plugin will enable to launch GPU attached [Kata](https://katacontainers.io/) VM based containers in your kubernetes cluster. Its specifically developed to serve Kata workloads in a Kubernetes cluster.
14
12
15
13
16
14
## Features
17
15
- Discovers Nvidia GPUs which are bound to VFIO-PCI driver and exposes them as devices available to be attached to VM in pass through mode.
18
-
- Discovers Nvidia vGPUs configured on a kubernetes node and exposes them to be attached to KubeVirt VMs
19
16
- Performs basic health check on the GPU on a kubernetes node.
20
17
21
18
## Prerequisites
22
-
- Need to have Nvidia GPU configured for GPU passthrough or vGPU. Quickstart section provides details about this
19
+
- Need to have Nvidia GPU configured for GPU passthrough. Quickstart section provides details about this
23
20
- Kubernetes version >= v1.11
24
-
- KubeVirt release >= v0.36.0
25
-
- KubeVirt GPU feature gate should be enabled and permitted devices should be whitelisted. Feature gate is enabled by creating a ConfigMap. ConfigMap yaml can be found under `/examples`.
21
+
- Kata release >= v3.23.0
26
22
27
23
## Quick Start
28
24
29
-
Before starting the device plug, the GPUs on a kubernetes node need to configured to be in GPU pass through mode or vGPU mode
30
-
31
-
### Whitelist GPU and vGPU in KubeVirt CR
32
-
GPUs and vGPUs should be allowlisted in the KubeVirt CR following the instructions outlined [here](https://kubevirt.io/user-guide/virtual_machines/host-devices/#listing-permitted-devices). An example KubeVirt CR can be found under `/examples`.
25
+
Before starting the device plug, the GPUs on a kubernetes node need to configured to be in GPU pass through mode.
33
26
34
27
### Preparing a GPU to be used in pass through mode
35
28
GPU needs to be loaded with VFIO-PCI driver to be used in pass through mode
Nvidia Virtual GPU manager needs to be installed on the host to configure GPUs in vGPU mode.
113
-
114
-
##### 1. Change to the mdev_supported_types directory for the physical GPU.
115
-
```shell
116
-
$ cd /sys/class/mdev_bus/domain\:bus\:slot.function/mdev_supported_types/
117
-
```
118
-
This example changes to the mdev_supported_types directory for the GPU with the domain 0000 and PCI device BDF 06:00.0.
119
-
```shell
120
-
$ cd /sys/bus/pci/devices/0000\:06\:00.0/mdev_supported_types/
121
-
```
122
-
##### 2. Find out which subdirectory of mdev_supported_types contains registration information for the vGPU type that you want to create.
123
-
```shell
124
-
$ grep -l "vgpu-type" nvidia-*/name
125
-
vgpu-type
126
-
```
127
-
The vGPU type, for example, M10-2Q.
128
-
This example shows that the registration information for the M10-2Q vGPU type is contained in the nvidia-41 subdirectory of mdev_supported_types.
129
-
```shell
130
-
$ grep -l "M10-2Q" nvidia-*/name
131
-
nvidia-41/name
132
-
```
133
-
##### 3. Confirm that you can create an instance of the vGPU type on the physical GPU.
134
-
```shell
135
-
$ cat subdirectory/available_instances
136
-
```
137
-
**subdirectory** -- The subdirectory that you found in the previous step, for example, nvidia-41.
138
-
139
-
The number of available instances must be at least 1. If the number is 0, either an instance of another vGPU type already exists on the physical GPU, or the maximum number of allowed instances has already been created.
140
-
141
-
This example shows that four more instances of the M10-2Q vGPU type can be created on the physical GPU.
142
-
```shell
143
-
$ cat nvidia-41/available_instances
144
-
4
145
-
```
146
-
##### 4. Generate a correctly formatted universally unique identifier (UUID) for the vGPU.
147
-
```shell
148
-
$ uuidgen
149
-
aa618089-8b16-4d01-a136-25a0f3c73123
150
-
```
151
-
##### 5. Write the UUID that you obtained in the previous step to create the file in the registration information directory for the vGPU type that you want to create.
152
-
```shell
153
-
$ echo"uuid"> subdirectory/create
154
-
```
155
-
**uuid** -- The UUID that you generated in the previous step, which will become the UUID of the vGPU that you want to create.
156
-
157
-
**subdirectory** -- The registration information directory for the vGPU type that you want to create, for example, nvidia-41.
158
-
159
-
This example creates an instance of the M10-2Q vGPU type with the UUID aa618089-8b16-4d01-a136-25a0f3c73123.
@@ -203,5 +133,5 @@ make push-image DOCKER_REPO=<docker-repo-url> DOCKER_TAG=<image-tag>
203
133
```
204
134
### To Do
205
135
- Improve the healthcheck mechanism for GPUs with VFIO-PCI drivers
206
-
- Support GetPreferredAllocation API of DevicePluginServer. It returns a preferred set of devices to allocate from a list of available ones. The resulting preferred allocation is not guaranteed to be the allocation ultimately performed by the devicemanager. It is only designed to help the devicemanager make a more informed allocation decision when possible. It has not been implemented in kubevirt-gpu-device-plugin.
136
+
- Support GetPreferredAllocation API of DevicePluginServer. It returns a preferred set of devices to allocate from a list of available ones. The resulting preferred allocation is not guaranteed to be the allocation ultimately performed by the devicemanager. It is only designed to help the devicemanager make a more informed allocation decision when possible. It has not been implemented in sandbox-device-plugin.
0 commit comments