Skip to content

DX12 / Vulkan WSI broken in container: /dev/nvidia-modeset created with mode 0644, breaks vkcube and vkd3d-proton #381

@egrosner

Description

@egrosner

Summary

Inside the cloudy-pad container, /dev/nvidia-modeset is created with mode 0644, so the unprivileged cloudy user can only open it O_RDONLY. NVIDIA's X11 Vulkan WSI requires O_RDWR, so any X11 Vulkan swapchain creation fails. This breaks every Vulkan-on-X11 game (DX12 via vkd3d-proton, native Vulkan, vkcube), while OpenGL/GLX is unaffected because it doesn't go through the modeset uAPI.

Symptoms

  • DX12 game window opens with audio but renders black; vkd3d-proton logs Presenter: Failed to query present modes: -13 (i.e. VK_ERROR_UNKNOWN from vkGetPhysicalDeviceSurfacePresentModesKHR).
  • vkcube aborts immediately: cube.c:1225: demo_prepare_buffers: Assertion '!err' failed.
  • vkcube --gpu_number 1 (llvmpipe) works.
  • vulkaninfo's Presentable Surfaces: section is empty for the NVIDIA GPU.
  • glxinfo / glxgears work fine on the T4.

Root cause

$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 /dev/nvidiactl
crw-r--r-- 1 root root 195, 254 /dev/nvidia-modeset   # <-- 0644
crw-rw-rw- 1 root root 234,   0 /dev/nvidia-uvm
crw-rw-rw- 1 root root 234,   1 /dev/nvidia-uvm-tools
$ python3 -c "import os; os.open('/dev/nvidia-modeset', os.O_RDWR)"
PermissionError: [Errno 13] Permission denied

The kernel module's own default is 0666 (/proc/driver/nvidia/params: DeviceFileMode: 438), so the 0644 is set by libnvidia-container when it mknods the node into the container's tmpfs /dev — it treats nvidia-modeset as a "control-only" device and doesn't anticipate an unprivileged in-container user needing RDWR for Vulkan WSI.

Verification of the fix

$ sudo chmod 0666 /dev/nvidia-modeset
$ vkcube              # runs and renders
$ vulkaninfo | grep -A3 'Presentable Surfaces'
Presentable Surfaces:
GPU id : 0 (Tesla T4):
        VK_KHR_xcb_surface
        VK_KHR_xlib_surface

Suggested fix

Add a one-liner to cloudy/bin/[setup-container-post-start.sh](http://setup-container-post-start.sh/) (which already runs as root via supervisord at container start):

chmod 0666 /dev/nvidia-modeset

Environment

  • Host: AWS EC2 g4dn (Tesla T4), Ubuntu cloud image, kernel 6.17.0-1012-aws
  • NVIDIA driver: 590.48.01 (open kernel module), userspace 590.48.01 inside container — versions match
  • Container runtime: NVIDIA Container Toolkit (libnvidia-container, evidenced by bind-mounts of /usr/lib/x86_64-linux-gnu/libnvidia-* and /usr/bin/nvidia-*)
  • X server: display :42, NVIDIA proprietary DDX, virtual 1920x1080 via Option "ConnectedMonitor" "DP-0" + MetaModes
  • Vulkan loader: 1.3.275, ICD /etc/vulkan/icd.d/nvidia_icd.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions