Skip to content

nvidia-container-toolkit, nvidia-k8s-device-plugin: change template to support CDI and legacy stack #467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sky1122
Copy link
Contributor

@sky1122 sky1122 commented Apr 11, 2025

Issue number:

Closes #

Description of changes:

  • Update nvidia-container-toolkit to support both CDI and legacy mode
  • Update nvidia-device-plugin to support the DeviceListStrategy="cdi-cri"

Testing done:
All the template current render and also able to change the DeviceListStrategy to "cdi-cri"

[root@admin]# apiclient set settings.kubelet-device-plugins.nvidia.device-list-strategy="volume-mount"
Failed to change settings: Failed PATCH request to '/settings/keypair?tx=apiclient-set-ftkRMWBGBNks2ZSk': Status 400 when PATCHing /settings/keypair?tx=apiclient-set-ftkRMWBGBNks2ZSk: Unable to match your input to the data model.  We may not have enough type information.  Please try the --json input form.  Cause: Error during deserialization: unknown variant `volume-mount`, expected one of `envvar`, `volume-mounts`, `cdi-cri` at line 1 column 74
[root@admin]# apiclient set settings.kubelet-device-plugins.nvidia.device-list-strategy="volume-mounts"
[root@admin]# apiclient get settings.kubelet-device-plugins.nvidia.device-list-strategy
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-list-strategy": "volume-mounts"
      }
    }
  }
}
[root@admin]# sheltie /etc/nvidia-k8s-device-plugin/settings.yaml
nsenter: failed to execute /etc/nvidia-k8s-device-plugin/settings.yaml: Permission denied
[root@admin]# sheltie cat /etc/nvidia-k8s-device-plugin/settings.yaml
version: v1
flags:

  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: true
    deviceListStrategy: volume-mounts
    deviceIDStrategy: index
    containerDriverRoot: "/"

[root@admin]# apiclient set settings.kubelet-device-plugins.nvidia.device-list-strategy="envvar"
[root@admin]# sheltie cat /etc/nvidia-k8s-device-plugin/settings.yaml
version: v1
flags:

  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: true
    deviceListStrategy: envvar
    deviceIDStrategy: index
    containerDriverRoot: "/"
[root@admin]# apiclient set settings.kubelet-device-plugins.nvidia.device-list-strategy="cdi-cri"
[root@admin]# sheltie cat /etc/nvidia-k8s-device-plugin/settings.yaml
version: v1
flags:

  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: true
    deviceListStrategy: "cdi-cri"
    deviceIDStrategy: index
    containerDriverRoot: "/"

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@sky1122 sky1122 changed the title Update nvidia container tool and k8s device plugin nvidia-container-toolkit, nvidia-k8s-device-plugin: change template to support CDI and legacy stack Apr 11, 2025
@sky1122
Copy link
Contributor Author

sky1122 commented Apr 11, 2025

Force pushed to get rid off necessary git commit

@sky1122
Copy link
Contributor Author

sky1122 commented Apr 11, 2025

close this DPR to add changes in #459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant