-
Notifications
You must be signed in to change notification settings - Fork 39
Support both CDI and Legacy NVIDIA Container Runtime modes #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support both CDI and Legacy NVIDIA Container Runtime modes #459
Conversation
6f2376d
to
e67fc04
Compare
force pushed to drop unrelated commits |
2deed72
to
f6daa5d
Compare
Forced push to change the commit content and add subject for patch |
f6daa5d
to
74d6c03
Compare
Forced push to add sign off in one commit |
forced push to add all the changes from #467 to this PR |
packages/nvidia-k8s-device-plugin/1100-k8s-device-plugin-all-device-patch.patch
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin.spec
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/1100-k8s-device-plugin-all-device-patch.patch
Outdated
Show resolved
Hide resolved
packages/nvidia-container-toolkit/nvidia-container-toolkit-config-k8s
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf
Outdated
Show resolved
Hide resolved
3325142
to
aab387b
Compare
force-pushed to fix the issue in the conversation |
packages/nvidia-k8s-device-plugin/0001-Add-CDI-specs-for-the-all-device.patch
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/0001-Add-CDI-specs-for-the-all-device.patch
Outdated
Show resolved
Hide resolved
packages/nvidia-k8s-device-plugin/0001-Add-CDI-specs-for-the-all-device.patch
Outdated
Show resolved
Hide resolved
packages/containerd/containerd-config-toml_k8s_nvidia_containerd_sock
Outdated
Show resolved
Hide resolved
aab387b
to
6f6c507
Compare
force-pushed to address the conversation above. |
25ea570
to
af3be11
Compare
af3be11
to
ee2f138
Compare
force-pushed to add a new patch in nvidia-k8s-device-plugin |
ee2f138
to
1145096
Compare
force-pushed to change one commit message |
|
||
[nvidia-container-runtime] | ||
{{#if settings.kubelet-device-plugins.nvidia.device-list-strategy}} | ||
{{#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy "cdi-cri")}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting mode="cdi"
heres is correct, but possibly for the wrong reasons.
Note that if the mode is cdi-cri
, the nvidia-container-runtime
will not perform the injection since it cannot read the CRI fields set by the plugin. This means that the nvidia-container-runtime
performs no modification to the OCI runtime specification and forwards the command to runc
.
In the case where cdi-cri
is used one does not strictly need the nvidia
runtime at all. In the case of the GPU Operator we "solve" this by:
- Not setting the
nvidia
runtime as the default in containerd. - Add a
RuntimeClass
for thenvidia
runtime. - Ensuring that "management" containers such as the k8s-device-plugin that themselves need access to GPUs use this
runtime
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @elezar, thanks again for the interest in this PR!
Note that if the mode is cdi-cri, the nvidia-container-runtime will not perform the injection since it cannot read the CRI fields set by the plugin. This means that the nvidia-container-runtime performs no modification to the OCI runtime specification and forwards the command to runc.
We use cdi-cri
to let the Kubernetes Device Plugin generate the CDI specifications, and we enabled the CDI support in containerd
to perform the CDI updates. So nvidia-container-runtime
effectively does nothing since all the updates were already applied by containerd.
The reason why we used nvidia-container-runtime
at all is to allow users use NVIDIA_VISIBLE_DEVICES
since, even though we understand that this environment variable is meant to be used solely for "management" containers, we have seen Bottlerocket users rely on this "feature" to allow over-subscription of the GPUs. As I said elsewhere, we do advice against this but we also don't want to break users that heavily rely on this type of oversubscription.
What we want to achieve with the setup that we have is a "hands-free" experience, where the users just create their pod specifications without having to worry about the RuntimeClass
they have to use as we have it today with the legacy support. Additionally, we wanted to stop injecting the prestart hook even from management containers to fully rely on CDI.
Please let us know if you think there are problems with this approach. FWIW, we don't run the k8s device plugin as a daemonset, we run it as a service in the host but users own running other "management" containers like dcgm-exporter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to circle back to this discussion, we decided to include the additional runtimes to be more compatible with the GPU Operator. Thanks for the feedback @elezar!
The NVIDIA Device Plugin looks for libraries and binaries at `/driver-root` because it expects to be running as a container where the driver root is mounted at that location. In Bottlerocket, the driver root is the actual root filesystem, so use that instead of the default value. Signed-off-by: Jingwei Wang <[email protected]>
e4af0a8
to
071452d
Compare
e06a3ce
to
91230e2
Compare
force pushed to change the commit format and comment to make it looks better |
Update the NVIDIA Container Toolkit configuration template to dynamically select the mode based off the configurations used in the Kubernetes Device Plugin. Signed-off-by: Jingwei Wang <[email protected]>
91230e2
to
98c5634
Compare
force pushed to fix the handle bar |
The 'default' handlebars helper doesn't work with arrays. Adjust how the device-list-strategy property is set depending on the type used in the API settings. Default to 'volume-mounts' when the setting is missing. Signed-off-by: Jingwei Wang <[email protected]> Signed-off-by: Arnaldo Garcia Rincon <[email protected]>
98c5634
to
972c58c
Compare
The prestart hook attempts to detect the mode used by the NVIDIA container runtime, and fails if the detected mode is different than 'legacy'. This breaks the NVIDIA legacy runtime when the default mode is set to 'cdi'. Prevent the prestart hook from detecting the runtime mode, as using 'cdi' as the default mode while using the NVIDIA legacy runtime is a valid configuration. Signed-off-by: Arnaldo Garcia Rincon <[email protected]>
Last push includes:
|
{{~#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy.[0] "cdi-cri") ~}} | ||
mode="cdi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that "cdi-cri" will always be the first item of the array? What is the behavior if we have multiple items in the array, and what if "cdi-cri" is the second item in the array?
May not be blocking. But it may be good to implement some helper like "has/contains" - similar to what they did in the device plugin - https://github.com/NVIDIA/k8s-device-plugin/blob/6f41f70c43f8da1357f51f64cf60431acc74141f/deployments/helm/nvidia-device-plugin/templates/_helpers.tpl#L178.
Also a note here - I was going to warn the index out of bound concern, but looks like the if
helper does check for empty list and treat it as false, so using "0" is safe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the conversation about custom helper, we had one, we were advice not to add it at least for this case:
{{~else~}} | ||
mode="legacy" | ||
{{~/if~}} | ||
{{~else~}} | ||
{{~#if (eq settings.kubelet-device-plugins.nvidia.device-list-strategy "cdi-cri") ~}} | ||
mode="cdi" | ||
{{~else~}} | ||
mode="legacy" | ||
{{~/if~}} | ||
{{/if}} | ||
{{else}} | ||
mode="legacy" | ||
{{/if}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love how the nested if else block here and multiple repeated `mode="legacy".
It looks like the only cases we want to set "cdi" is when the device-list-strategy is set as "cdi-cri" (in string or list form). Could we clean up the if-else logic here?
On another note, this along with the comment I have above, it might be worth consider introducing a dedicated "device-list-strategy" helper? The logic will be simplified in rust code.
Our custom helpers - https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/develop/sources/api/schnauzer/src/v2/import/helpers.rs#L36-L88
Issue number:
Closes #468
Description of changes:
mode
for the default NVIDIA container runtime, based off the configurations set for thedevice-list-strategy
in the k8s device pluginmode
for the default NVIDIA runtime is set tocdi
.Testing done:
In combination with:
In variants:
aws-k8s-1.29
,aws-k8s-1.31
,aws-k8s-1.32
,aws-k8s-1.33
for x86_64 and aarch64:device-list-strategy
values:Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.