Skip to content

Does CDI mode support CUDA Forward Compatibility? #942

@wqlparallel

Description

@wqlparallel

hi, I was reviewing the CVE-2025-23359 security bulletin and noticed that the vulnerability does not affect CDI mode. While this is reassuring, I’d like to kindly ask for clarification on how CUDA Forward Compatibility is handled in CDI mode, particularly for containers built with newer CUDA Toolkits running on nodes with older NVIDIA Linux GPU drivers.

After inspecting /etc/cdi/nvidia.yaml, I see that nvidia-cdi-hook injects the path(e.g., /usr/lib64) which host’s libcuda path mount into the container’s /etc/ld.so.conf.d/00-nvcr-<RANDOM_STRING>.conf. However, I’m uncertain how this ensures compatibility for applications requiring CUDA Forward Compatibility (e.g., binding /usr/local/cuda/compat libraries). For example, if a container built with CUDA 12.2 (requiring driver ≥535) runs on a host with driver 525, I don’t see mechanisms in CDI specs to automatically include compatibility stubs.

I also came across PR #906, which introduced nvidia-cdi-hook compat-libs --driver-version 999.88.77 to address Forward Compatibility. This makes me wonder:

Before #906: Was CDI mode inherently unable to support CUDA Forward Compatibility due to missing library bindings?
After #906: Does enabling compatibility now require manual configuration (e.g., specifying --driver-version), or is this handled automatically in CDI spec generation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions