mps-control-daemon is restarting with GPU drivers installed through GKE daemonset


mps-control-daemon is restarting when GPU drivers are installed through GKE daemonset, tested with example Pod from https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/main/demo/specs/quickstart/gpu-test-mps.yaml

Kuberentes version: `1.33.2-gke.1240000`
Image type: `UBUNTU_CONTAINERD`

logs:
```
$ kubectl logs -n nvidia mps-control-daemon-d8478efa-5c09-461d-b7b9-f59f320396a8-5cv5nr8  -p
chroot: failed to run command 'sh': No such file or directory
```

GPU driver installation:
```
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml
```

DRA helm chart installation:
```
cat <<EOF > dra_values.yaml
resources.gpus.enabled: "true"
gpuResourcesEnabledOverride: "true"
nvidiaDriverRoot: /opt/nvidia

controller:
  tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
      effect: NoSchedule
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "nvidia.com/gpu.present"
                operator: "Exists"
kubeletPlugin:
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule
  nodeSelector:
    nvidia.com/gpu.present: "true"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "nvidia.com/gpu.present"
                operator: "Exists"
EOF

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update

helm upgrade -i nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu --version="25.3.0-rc.4" \
  --namespace nvidia \
  -f dra_values.yaml
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mps-control-daemon is restarting with GPU drivers installed through GKE daemonset #469

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mps-control-daemon is restarting with GPU drivers installed through GKE daemonset #469

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions