Skip to content

Create Container Fails: Auto-detected mode as 'legacy' nvidia-container-cli: ldcache error: process /sbin/ldconfig terminated with signal 9 #1400

@reefland

Description

@reefland

Bare metal K3S v1.33.3+k3s1 on kernel 6.15.11-2-MANJARO.

Not a new install; this had been stable for many months. Rebooted node with the GPU, now POD crash loops with message:

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: ldcache error: process /sbin/ldconfig terminated with signal 9

I'm confused by the OCI, Legacy and ldcache references.

Chart reference in ArgoCD:

- repoURL: https://nvidia.github.io/k8s-device-plugin
  chart: nvidia-device-plugin
  targetRevision: 0.17.3

Helm Values File:

---
# yaml-language-server: $schema=https://json.schemastore.org/helmfile

nodeSelector:
  nvidia.feature.node.kubernetes.io/gpu.3060: "true"

runtimeClassName: nvidia

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: nvidia.feature.node.kubernetes.io/gpu.3060
              operator: In
              values:
                - "true"

config:
  map:
    default: |-
      version: v1
      flags:
        migStrategy: none
      sharing:
        timeSlicing:
          renameByDefault: false
          failRequestsGreaterThanOne: false
          resources:
            - name: nvidia.com/gpu
              replicas: 6

# Subcharts
nfd: {}
gfd:
  enabled: false
  • NFD is already installed via its own Helm Chart.

Current versions on host:

$ pacman -Q libnvidia-container
libnvidia-container 1.17.8-1

$ pacman -Q nvidia-container-toolkit
nvidia-container-toolkit 1.17.8-1

$ pacman -Q nvidia-utils
nvidia-utils 575.64.05-1
$ nvidia-smi
Tue Sep  2 15:33:10 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05              Driver Version: 575.64.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:09:00.0  On |                  N/A |
|  0%   54C    P3             30W /  170W |    1665MiB /  12288MiB |     37%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
$ nvidia-container-cli info
NVRM version:   575.64.05
CUDA version:   12.9

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 3060
Brand:          GeForce
GPU UUID:       GPU-ace6a26d-6a78-9562-4fbc-69984c397347
Bus Location:   00000000:09:00.0
Architecture:   8.6
$ nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/lib/libnvidia-ml.so.575.64.05
/usr/lib/libnvidia-cfg.so.575.64.05
/usr/lib/libcuda.so.575.64.05
/usr/lib/libcudadebugger.so.575.64.05
/usr/lib/libnvidia-gpucomp.so.575.64.05
/usr/lib/libnvidia-ptxjitcompiler.so.575.64.05
/usr/lib/libnvidia-allocator.so.575.64.05
/usr/lib/libnvidia-pkcs11.so.575.64.05
/usr/lib/libnvidia-pkcs11-openssl3.so.575.64.05
/usr/lib/libnvidia-nvvm.so.575.64.05
/usr/lib/libnvidia-ngx.so.575.64.05
/usr/lib/libnvidia-encode.so.575.64.05
/usr/lib/libnvidia-opticalflow.so.575.64.05
/usr/lib/libnvcuvid.so.575.64.05
/usr/lib/libnvidia-eglcore.so.575.64.05
/usr/lib/libnvidia-glcore.so.575.64.05
/usr/lib/libnvidia-tls.so.575.64.05
/usr/lib/libnvidia-glsi.so.575.64.05
/usr/lib/libnvidia-fbc.so.575.64.05
/usr/lib/libnvidia-rtcore.so.575.64.05
/usr/lib/libnvoptix.so.575.64.05
/usr/lib/libGLX_nvidia.so.575.64.05
/usr/lib/libEGL_nvidia.so.575.64.05
/usr/lib/libGLESv2_nvidia.so.575.64.05
/usr/lib/libGLESv1_CM_nvidia.so.575.64.05
/usr/lib/libnvidia-glvkspirv.so.575.64.05
/usr/lib32/libnvidia-ml.so.575.64.05
/usr/lib32/libcuda.so.575.64.05
/usr/lib32/libnvidia-gpucomp.so.575.64.05
/usr/lib32/libnvidia-ptxjitcompiler.so.575.64.05
/usr/lib32/libnvidia-allocator.so.575.64.05
/usr/lib32/libnvidia-encode.so.575.64.05
/usr/lib32/libnvidia-opticalflow.so.575.64.05
/usr/lib32/libnvcuvid.so.575.64.05
/usr/lib32/libnvidia-eglcore.so.575.64.05
/usr/lib32/libnvidia-glcore.so.575.64.05
/usr/lib32/libnvidia-tls.so.575.64.05
/usr/lib32/libnvidia-glsi.so.575.64.05
/usr/lib32/libnvidia-fbc.so.575.64.05
/usr/lib32/libGLX_nvidia.so.575.64.05
/usr/lib32/libEGL_nvidia.so.575.64.05
/usr/lib32/libGLESv2_nvidia.so.575.64.05
/usr/lib32/libGLESv1_CM_nvidia.so.575.64.05
/usr/lib32/libnvidia-glvkspirv.so.575.64.05
/lib/firmware/nvidia/575.64.05/gsp_ga10x.bin
/lib/firmware/nvidia/575.64.05/gsp_tu10x.bin

From K3S config:

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia']
  runtime_type = "io.containerd.runc.v2"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia'.options]
  BinaryName = "/usr/bin/nvidia-container-runtime"
  SystemdCgroup = true

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia-cdi']
  runtime_type = "io.containerd.runc.v2"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia-cdi'.options]
  BinaryName = "/usr/bin/nvidia-container-runtime.cdi"
  SystemdCgroup = true
$ k get all -n nvidia
NAME                             READY   STATUS                  RESTARTS       AGE
pod/nvidia-device-plugin-268bb   0/2     Init:CrashLoopBackOff   16 (50s ago)   60m

NAME                                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                                 AGE
daemonset.apps/nvidia-device-plugin                      1         1         0       1            0           nvidia.feature.node.kubernetes.io/gpu.3060=true                               60m
daemonset.apps/nvidia-device-plugin-mps-control-daemon   0         0         0       0            0           nvidia.com/mps.capable=true,nvidia.feature.node.kubernetes.io/gpu.3060=true   60m

Nvidia packages have not been updated recently on the host:

$ ls -ltR /var/cache/pacman/pkg/nvidia*.zst
.rw-r--r-- root root  80 KB Mon Aug  4 11:36:53 2025  /var/cache/pacman/pkg/nvidia-driver-assistant-0.22.65.06-1-any.pkg.tar.zst
.rw-r--r-- root root 334 MB Tue Jul 22 13:48:56 2025  /var/cache/pacman/pkg/nvidia-utils-575.64.05-1-x86_64.pkg.tar.zst
.rw-r--r-- root root 334 MB Tue Jul  1 17:02:35 2025  /var/cache/pacman/pkg/nvidia-utils-575.64.03-1-x86_64.pkg.tar.zst
.rw-r--r-- root root 334 MB Tue Jun 17 14:26:14 2025  /var/cache/pacman/pkg/nvidia-utils-575.64-1-x86_64.pkg.tar.zst
.rw-r--r-- root root  79 KB Tue Jun  3 21:00:32 2025  /var/cache/pacman/pkg/nvidia-driver-assistant-0.21.57.08-1-any.pkg.tar.zst
.rw-r--r-- root root 4.3 MB Sun Jun  1 11:33:12 2025  /var/cache/pacman/pkg/nvidia-container-toolkit-1.17.8-1-x86_64.pkg.tar.zst
.rw-r--r-- root root  79 KB Thu May  1 23:38:55 2025  /var/cache/pacman/pkg/nvidia-driver-assistant-0.21.51.03-1-any.pkg.tar.zst
.rw-r--r-- root root 4.3 MB Sat Apr 26 11:27:21 2025  /var/cache/pacman/pkg/nvidia-container-toolkit-1.17.6-1-x86_64.pkg.tar.zst
.rw-r--r-- root root 4.2 MB Thu Mar 13 11:17:51 2025  /var/cache/pacman/pkg/nvidia-container-toolkit-1.17.5-1-x86_64.pkg.tar.zst

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions