-
Notifications
You must be signed in to change notification settings - Fork 429
Description
hi, I was reviewing the CVE-2025-23359 security bulletin and noticed that the vulnerability does not affect CDI mode. While this is reassuring, I’d like to kindly ask for clarification on how CUDA Forward Compatibility is handled in CDI mode, particularly for containers built with newer CUDA Toolkits running on nodes with older NVIDIA Linux GPU drivers.
After inspecting /etc/cdi/nvidia.yaml, I see that nvidia-cdi-hook injects the path(e.g., /usr/lib64) which host’s libcuda path mount into the container’s /etc/ld.so.conf.d/00-nvcr-<RANDOM_STRING>.conf. However, I’m uncertain how this ensures compatibility for applications requiring CUDA Forward Compatibility (e.g., binding /usr/local/cuda/compat libraries). For example, if a container built with CUDA 12.2 (requiring driver ≥535) runs on a host with driver 525, I don’t see mechanisms in CDI specs to automatically include compatibility stubs.
I also came across PR #906, which introduced nvidia-cdi-hook compat-libs --driver-version 999.88.77 to address Forward Compatibility. This makes me wonder:
Before #906: Was CDI mode inherently unable to support CUDA Forward Compatibility due to missing library bindings?
After #906: Does enabling compatibility now require manual configuration (e.g., specifying --driver-version), or is this handled automatically in CDI spec generation?