Support both CDI and Legacy NVIDIA Container Runtime modes #459

sky1122 · 2025-04-08T21:25:49Z

Issue number:

Closes #468

Description of changes:

Set DriverRoot and DefaultContainterDriverRoot in nvidia-k8s-device-plugin
- k8s-device-plugin always mount the DefaultContainterDriverRoot at /driver-root, but for bottlerocket the driver is in the root filesystem.
Select the correct mode for the default NVIDIA container runtime, based off the configurations set for the device-list-strategy in the k8s device plugin
Disable mode detection in the prestart hook, as it prevents the NVIDIA legacy runtime from working when the mode for the default NVIDIA runtime is set to cdi.

Testing done:
In combination with:

In variants: aws-k8s-1.29, aws-k8s-1.31, aws-k8s-1.32, aws-k8s-1.33 for x86_64 and aarch64:

Verified that all the containerd runtimes are functional
Verified that the correct mode is configured depending on the device-list-strategy values:

API Value			Mode	Status
envvar	-	-	legacy	Pass
envvar	volume-mounts	-	legacy	Pass
envvar	volume-mounts	cdi-cri	legacy	Pass
envvar	cdi-cri	-	legacy	Pass
ennvar	cdi-cri	volume-mounts	legacy	Pass
volume-mounts	-	-	legacy	Pass
volume-mounts	envvar	-	legacy	Pass
volume-mounts	envvar	cdi-cri	legacy	Pass
volume-mounts	cdi-cri	-	legacy	Pass
volume-mounts	cdi-cri	envvar	legacy	Pass
cdi-cri	-	-	cdi	Pass
cdi-cri	volume-mounts	-	cdi	Pass
cdi-cri	volume-mounts	envvar	cdi	Pass
cdi-cri	envvar	-	cdi	Pass
cdi-cri	envvar	volume-mounts	cdi	Pass

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

sky1122 · 2025-04-08T21:31:08Z

force pushed to drop unrelated commits

sky1122 · 2025-04-09T23:56:49Z

Forced push to change the commit content and add subject for patch

sky1122 · 2025-04-11T18:26:17Z

Forced push to add sign off in one commit

sky1122 · 2025-04-11T23:25:40Z

forced push to add all the changes from #467 to this PR

packages/nvidia-k8s-device-plugin/1100-k8s-device-plugin-all-device-patch.patch

packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin.spec

packages/nvidia-k8s-device-plugin/1100-k8s-device-plugin-all-device-patch.patch

packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf

packages/nvidia-container-toolkit/nvidia-container-toolkit-config-k8s

packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf

sky1122 · 2025-04-16T18:45:39Z

force-pushed to fix the issue in the conversation

packages/nvidia-k8s-device-plugin/0001-Add-CDI-specs-for-the-all-device.patch

packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf

packages/containerd/containerd-config-toml_k8s_nvidia_containerd_sock

packages/nvidia-container-toolkit/nvidia-container-toolkit-config-k8s

sky1122 · 2025-04-18T17:33:17Z

force-pushed to address the conversation above.

sky1122 · 2025-04-24T00:05:45Z

force-pushed to add a new patch in nvidia-k8s-device-plugin

sky1122 · 2025-04-24T00:09:15Z

force-pushed to change one commit message

elezar · 2025-04-24T07:54:49Z

packages/nvidia-container-toolkit/nvidia-container-toolkit-config-k8s

+
+[nvidia-container-runtime]
+{{#if settings.kubelet-device-plugins.nvidia.device-list-strategy}}
+{{#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy "cdi-cri")}}


Setting mode="cdi" heres is correct, but possibly for the wrong reasons.

Note that if the mode is cdi-cri, the nvidia-container-runtime will not perform the injection since it cannot read the CRI fields set by the plugin. This means that the nvidia-container-runtime performs no modification to the OCI runtime specification and forwards the command to runc.

In the case where cdi-cri is used one does not strictly need the nvidia runtime at all. In the case of the GPU Operator we "solve" this by:

Not setting the nvidia runtime as the default in containerd.

Add a RuntimeClass for the nvidia runtime.

Ensuring that "management" containers such as the k8s-device-plugin that themselves need access to GPUs use this runtime class.

Hey @elezar, thanks again for the interest in this PR!

Note that if the mode is cdi-cri, the nvidia-container-runtime will not perform the injection since it cannot read the CRI fields set by the plugin. This means that the nvidia-container-runtime performs no modification to the OCI runtime specification and forwards the command to runc.

We use cdi-cri to let the Kubernetes Device Plugin generate the CDI specifications, and we enabled the CDI support in containerd to perform the CDI updates. So nvidia-container-runtime effectively does nothing since all the updates were already applied by containerd.

The reason why we used nvidia-container-runtime at all is to allow users use NVIDIA_VISIBLE_DEVICES since, even though we understand that this environment variable is meant to be used solely for "management" containers, we have seen Bottlerocket users rely on this "feature" to allow over-subscription of the GPUs. As I said elsewhere, we do advice against this but we also don't want to break users that heavily rely on this type of oversubscription.

What we want to achieve with the setup that we have is a "hands-free" experience, where the users just create their pod specifications without having to worry about the RuntimeClass they have to use as we have it today with the legacy support. Additionally, we wanted to stop injecting the prestart hook even from management containers to fully rely on CDI.

Please let us know if you think there are problems with this approach. FWIW, we don't run the k8s device plugin as a daemonset, we run it as a service in the host but users own running other "management" containers like dcgm-exporter.

Just to circle back to this discussion, we decided to include the additional runtimes to be more compatible with the GPU Operator. Thanks for the feedback @elezar!

The NVIDIA Device Plugin looks for libraries and binaries at `/driver-root` because it expects to be running as a container where the driver root is mounted at that location. In Bottlerocket, the driver root is the actual root filesystem, so use that instead of the default value. Signed-off-by: Jingwei Wang <[email protected]>

sky1122 · 2025-05-16T17:09:30Z

force pushed to change the commit format and comment to make it looks better

Update the NVIDIA Container Toolkit configuration template to dynamically select the mode based off the configurations used in the Kubernetes Device Plugin. Signed-off-by: Jingwei Wang <[email protected]>

sky1122 · 2025-05-16T20:27:13Z

force pushed to fix the handle bar

The 'default' handlebars helper doesn't work with arrays. Adjust how the device-list-strategy property is set depending on the type used in the API settings. Default to 'volume-mounts' when the setting is missing. Signed-off-by: Jingwei Wang <[email protected]> Signed-off-by: Arnaldo Garcia Rincon <[email protected]>

The prestart hook attempts to detect the mode used by the NVIDIA container runtime, and fails if the detected mode is different than 'legacy'. This breaks the NVIDIA legacy runtime when the default mode is set to 'cdi'. Prevent the prestart hook from detecting the runtime mode, as using 'cdi' as the default mode while using the NVIDIA legacy runtime is a valid configuration. Signed-off-by: Arnaldo Garcia Rincon <[email protected]>

arnaldo2792 · 2025-05-19T15:11:52Z

Last push includes:

Add missing "header" to the k8s device plugin configuration
Disable mode detection for prestart hook

ytsssun · 2025-05-19T21:46:48Z

packages/nvidia-container-toolkit/nvidia-container-toolkit-config-k8s

+{{~#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy.[0] "cdi-cri") ~}}
+mode="cdi"


Is it expected that "cdi-cri" will always be the first item of the array? What is the behavior if we have multiple items in the array, and what if "cdi-cri" is the second item in the array?

May not be blocking. But it may be good to implement some helper like "has/contains" - similar to what they did in the device plugin - https://github.com/NVIDIA/k8s-device-plugin/blob/6f41f70c43f8da1357f51f64cf60431acc74141f/deployments/helm/nvidia-device-plugin/templates/_helpers.tpl#L178.

Also a note here - I was going to warn the index out of bound concern, but looks like the if helper does check for empty list and treat it as false, so using "0" is safe here.

Regarding the conversation about custom helper, we had one, we were advice not to add it at least for this case:

#502

ytsssun · 2025-05-19T21:53:35Z

packages/nvidia-container-toolkit/nvidia-container-toolkit-config-k8s

+{{~else~}}
+mode="legacy"
+{{~/if~}}
+{{~else~}}
+{{~#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy "cdi-cri") ~}}
+mode="cdi"
+{{~else~}}
+mode="legacy"
+{{~/if~}}
+{{/if}}
+{{else}}
+mode="legacy"
+{{/if}}


I don't love how the nested if else block here and multiple repeated `mode="legacy".

It looks like the only cases we want to set "cdi" is when the device-list-strategy is set as "cdi-cri" (in string or list form). Could we clean up the if-else logic here?

On another note, this along with the comment I have above, it might be worth consider introducing a dedicated "device-list-strategy" helper? The logic will be simplified in rust code.

Our custom helpers - https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/develop/sources/api/schnauzer/src/v2/import/helpers.rs#L36-L88

sky1122 force-pushed the enable-cdi-k8s-containerd branch from 6f2376d to e67fc04 Compare April 8, 2025 21:30

sky1122 requested a review from arnaldo2792 April 8, 2025 21:34

sky1122 force-pushed the enable-cdi-k8s-containerd branch 2 times, most recently from 2deed72 to f6daa5d Compare April 9, 2025 23:55

sky1122 force-pushed the enable-cdi-k8s-containerd branch from f6daa5d to 74d6c03 Compare April 11, 2025 18:25

sky1122 mentioned this pull request Apr 11, 2025

nvidia-container-toolkit, nvidia-k8s-device-plugin: change template to support CDI and legacy stack #467

Closed

arnaldo2792 requested changes Apr 14, 2025

View reviewed changes

sky1122 changed the title ~~Packages: enable cdi in k8s-device-plugin and containerd~~ Packages: add support for CDI in k8s Apr 14, 2025

sky1122 force-pushed the enable-cdi-k8s-containerd branch from 3325142 to aab387b Compare April 16, 2025 16:59

sky1122 requested a review from arnaldo2792 April 16, 2025 18:45

arnaldo2792 reviewed Apr 17, 2025

View reviewed changes

sky1122 force-pushed the enable-cdi-k8s-containerd branch from aab387b to 6f6c507 Compare April 18, 2025 17:32

sky1122 requested a review from arnaldo2792 April 18, 2025 18:00

sky1122 force-pushed the enable-cdi-k8s-containerd branch 2 times, most recently from 25ea570 to af3be11 Compare April 21, 2025 19:48

sky1122 added bug Something isn't working and removed bug Something isn't working labels Apr 21, 2025

sky1122 marked this pull request as ready for review April 21, 2025 21:23

sky1122 requested review from bcressey, ytsssun and koooosh April 21, 2025 21:23

sky1122 force-pushed the enable-cdi-k8s-containerd branch from af3be11 to ee2f138 Compare April 24, 2025 00:01

sky1122 removed the request for review from koooosh April 24, 2025 00:06

sky1122 requested a review from KCSesh April 24, 2025 00:06

sky1122 force-pushed the enable-cdi-k8s-containerd branch from ee2f138 to 1145096 Compare April 24, 2025 00:07

elezar reviewed Apr 24, 2025

View reviewed changes

arnaldo2792 mentioned this pull request Apr 24, 2025

Migrate ECS to CDI #482

Merged

sky1122 mentioned this pull request May 12, 2025

nvidia-k8s-device-plugin: add ldcache parsing for aarch64 patch #501

Merged

sky1122 force-pushed the enable-cdi-k8s-containerd branch 2 times, most recently from e4af0a8 to 071452d Compare May 14, 2025 23:21

arnaldo2792 marked this pull request as draft May 15, 2025 23:56

sky1122 force-pushed the enable-cdi-k8s-containerd branch 2 times, most recently from e06a3ce to 91230e2 Compare May 16, 2025 17:08

nvidia-container-toolkit: support both cdi and legacy stacks

ece33ef

Update the NVIDIA Container Toolkit configuration template to dynamically select the mode based off the configurations used in the Kubernetes Device Plugin. Signed-off-by: Jingwei Wang <[email protected]>

sky1122 force-pushed the enable-cdi-k8s-containerd branch from 91230e2 to 98c5634 Compare May 16, 2025 20:24

arnaldo2792 force-pushed the enable-cdi-k8s-containerd branch from 98c5634 to 972c58c Compare May 19, 2025 14:58

arnaldo2792 mentioned this pull request May 19, 2025

Provide additional CRI runtimes for NVIDIA #511

Merged

arnaldo2792 marked this pull request as ready for review May 19, 2025 17:11

yeazelm approved these changes May 19, 2025

View reviewed changes

ytsssun reviewed May 19, 2025

View reviewed changes

KCSesh approved these changes May 19, 2025

View reviewed changes

arnaldo2792 approved these changes May 19, 2025

View reviewed changes

arnaldo2792 changed the title ~~Packages: add support for CDI in k8s~~ Support both CDI and Legacy NVIDIA Container Runtime modes May 19, 2025

arnaldo2792 merged commit d0ea56c into bottlerocket-os:develop May 19, 2025
2 checks passed

		{{~#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy.[0] "cdi-cri") ~}}
		mode="cdi"

Support both CDI and Legacy NVIDIA Container Runtime modes #459

Support both CDI and Legacy NVIDIA Container Runtime modes #459

Uh oh!

Conversation

sky1122 commented Apr 8, 2025 • edited by arnaldo2792 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sky1122 commented Apr 8, 2025

Uh oh!

sky1122 commented Apr 9, 2025

Uh oh!

sky1122 commented Apr 11, 2025

Uh oh!

sky1122 commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sky1122 commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sky1122 commented Apr 18, 2025

Uh oh!

sky1122 commented Apr 24, 2025

Uh oh!

sky1122 commented Apr 24, 2025

Uh oh!

elezar Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

arnaldo2792 Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arnaldo2792 May 14, 2025

Choose a reason for hiding this comment

Uh oh!

sky1122 commented May 16, 2025

Uh oh!

sky1122 commented May 16, 2025

Uh oh!

arnaldo2792 commented May 19, 2025

Uh oh!

ytsssun May 19, 2025

Choose a reason for hiding this comment

Uh oh!

arnaldo2792 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

ytsssun May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sky1122 commented Apr 8, 2025 •

edited by arnaldo2792

Loading

arnaldo2792 Apr 24, 2025 •

edited

Loading