Skip to content

Commit c4c32e0

Browse files
committed
fix: nvidia devices permissions in container
1 parent b6c874f commit c4c32e0

3 files changed

Lines changed: 23 additions & 6 deletions

File tree

ansible/roles/nvidia-driver/defaults/main.yml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,14 @@ nvidia_driver_skip_reboot: false
2626

2727
# NVIDIA Container Toolkit version
2828
# See available versions: apt-cache madison nvidia-container-toolkit
29-
# Pinned to avoid drift between providers (CDI vs Legacy mode behavior differences)
30-
# IMPORTANT: Must stay on 1.17.x - versions 1.18+ default to CDI mode which keeps
31-
# /dev/dri/card1 at 0660 root:root inside the container, preventing the NVIDIA Vulkan
32-
# driver from accessing the DRM card node, causing vkGetPhysicalDeviceSurfacePresentModesKHR
33-
# to fail with VK_ERROR_UNKNOWN (-13) and a black screen.
34-
# Legacy mode (1.17.x) sets /dev/dri/card1 to 0666, allowing proper GPU access.
29+
# 1.18+ defaults to JIT CDI instead of legacy mode and no longer enables the CDI
30+
# chmod hook by default. CDI keeps host device permissions for /dev/dri/card*,
31+
# /dev/dri/renderD* and /dev/nvidia-modeset, which can leave them inaccessible
32+
# to non-root game processes in containers and cause Vulkan black screens such as
33+
# VK_ERROR_UNKNOWN (-13) from vkGetPhysicalDeviceSurfacePresentModesKHR.
34+
# We explicitly chmod those devices at startup instead of relying on legacy mode.
35+
# See:
36+
# - https://github.com/NVIDIA/nvidia-container-toolkit/issues/1218
37+
# - https://github.com/NVIDIA/nvidia-container-toolkit/issues/1456
38+
# - https://github.com/NVIDIA/nvidia-container-toolkit/issues/1477
3539
nvidia_container_toolkit_version: "1.19.0-1"

containers/sunshine/overlay/cloudy/bin/setup-all.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ source setup-dirs.sh
1616
source setup-user.sh
1717

1818
if [ "$NVIDIA_ENABLE" = true ]; then
19+
source setup-nvidia-permissions.sh
1920
source setup-nvidia-driver.sh
2021
fi
2122

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/usr/bin/env bash
2+
3+
#
4+
# Setup NVIDIA device permissions.
5+
#
6+
7+
# NVIDIA Container Toolkit 1.18+ defaults to JIT CDI mode and no longer applies
8+
# its CDI chmod hook by default. Keep GPU display devices usable by the
9+
# unprivileged cloudy user when devices are mounted with host permissions.
10+
echo "Setting NVIDIA device permissions for /dev/dri/card*, /dev/dri/renderD* and /dev/nvidia-modeset"
11+
12+
chmod 0666 /dev/dri/card* /dev/dri/renderD* /dev/nvidia-modeset || true

0 commit comments

Comments
 (0)