- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 738
 
Open
Description
Bug Report
When following the docs for proprietary or OSS drivers the device plugin doesn't work beyond 0.14.5.
Description
Using the default helm install the daemonset pods never get scheduled on nodes.
helm install nvidia-device-plugin nvdp/nvidia-device-plugin --version=0.14.5 --set=runtimeClassName=nvidia --namespace kube-system
https://github.com/NVIDIA/k8s-device-plugin/releases/tag/v0.15.0
Logs
Here's the device plugin daemonset for 0.18.0 which never has pods created by the controller manager.
Name:           nvidia-device-plugin                                                                                                                                                                                
Namespace:      kube-system                                                                                                                                                                                         
Selector:       app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin                                                                                                         
Node-Selector:  <none>                                                                                                                                                                                              
Labels:         app.kubernetes.io/instance=nvidia-device-plugin                                                                                                                                                     
                app.kubernetes.io/managed-by=Helm                                                                                                                                                                   
                app.kubernetes.io/name=nvidia-device-plugin                                                                                                                                                         
                app.kubernetes.io/version=0.18.0                                                                                                                                                                    
                helm.sh/chart=nvidia-device-plugin-0.18.0                                                                                                                                                           
Annotations:    deprecated.daemonset.template.generation: 1                                                                                                                                                         
                meta.helm.sh/release-name: nvidia-device-plugin                                                                                                                                                     
                meta.helm.sh/release-namespace: kube-system                                                                                                                                                         
Desired Number of Nodes Scheduled: 0                                                                                                                                                                                
Current Number of Nodes Scheduled: 0                                                                                                                                                                                
Number of Nodes Scheduled with Up-to-date Pods: 0                                                                                                                                                                   
Number of Nodes Scheduled with Available Pods: 0                                                                                                                                                                    
Number of Nodes Misscheduled: 0                                                                                                                                                                                     
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed                                                                                                                                                        
Pod Template:                                                                                                                                                                                                       
  Labels:  app.kubernetes.io/instance=nvidia-device-plugin                                                                                                                                                          
           app.kubernetes.io/name=nvidia-device-plugin                                                                                                                                                              
  Containers:                                                                                                                                                                                                       
   nvidia-device-plugin-ctr:                                                                                                                                                                                        
    Image:      nvcr.io/nvidia/k8s-device-plugin:v0.18.0                                                                                                                                                            
    Port:       <none>                                                                                                                                                                                              
    Host Port:  <none>                                                                                                                                                                                              
    Command:                                                                                                                                                                                                        
      nvidia-device-plugin                                                                                                                                                                                          
    Environment:                                                                                                                                                                                                    
      MPS_ROOT:                    /run/nvidia/mps                                                                                                                                                                  
      NVIDIA_VISIBLE_DEVICES:      all                                                                                                                                                                              
      NVIDIA_DRIVER_CAPABILITIES:  compute,utility
    Mounts:                                                                                                                                                                                                         
      /dev/shm from mps-shm (rw)                                                                                                                                                                                    
      /mps from mps-root (rw)                                                                                                                                                                                       
      /var/lib/kubelet/device-plugins from kubelet-device-plugins-dir (rw)                                                                                                                                          
      /var/run/cdi from cdi-root (rw)                                                                                                                                                                               
  Volumes:                                                                                                                                                                                                          
   kubelet-device-plugins-dir:                                                                                                                                                                                      
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:  Directory
   mps-root:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia/mps
    HostPathType:  DirectoryOrCreate
   mps-shm:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia/mps/shm
    HostPathType:  
   cdi-root:
    Type:               HostPath (bare host directory volume)
    Path:               /var/run/cdi
    HostPathType:       DirectoryOrCreate
  Priority Class Name:  system-node-critical
  Node-Selectors:       <none>
  Tolerations:          CriticalAddonsOnly op=Exists
                        nvidia.com/gpu:NoSchedule op=Exists
Events:                 <none>
Environment
- Talos version: 1.11.3
 - Kubernetes version: 1.34.1
 - Platform: metal
 
Metadata
Metadata
Assignees
Labels
No labels