Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Oct 31, 2025

This change ensures that directories that are already included in the ld.conf are not added to the NVCR-specific config files. The intent is to not inadventently promote existing directories to a higher priority.

See #123 where this was already an issue in the libnvidia-container-based implementation.

@elezar
Copy link
Member Author

elezar commented Oct 31, 2025

/cherry-pick release-1.18

@elezar elezar added this to the v1.18.1 milestone Oct 31, 2025
@elezar elezar self-assigned this Oct 31, 2025
@elezar elezar force-pushed the selective-ldcache-config branch from 0b4a83b to d0a1221 Compare October 31, 2025 15:25
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
2 non Blocking comments

return SafeExec(ldconfigPath, args, nil)
}

func (l *Ldconfig) filterDirectories(configFilePath string, directories ...string) ([]string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: If the function doesn't need to access or modify the state of an Ldconfig object, why not to be a regular standalone function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally accessing a member, but I can remove it.

@elezar elezar force-pushed the selective-ldcache-config branch from d0a1221 to 2585da9 Compare November 5, 2025 12:54
@elezar elezar force-pushed the selective-ldcache-config branch from 2585da9 to 5b1d918 Compare November 5, 2025 14:32
@ArangoGutierrez ArangoGutierrez self-requested a review November 5, 2025 15:27
// Explicitly specify using /etc/ld.so.conf since the host's ldconfig may
// be configured to use a different config file by default.
configFilePath := "/etc/ld.so.conf"
filteredDirectories, err := filterDirectories(configFilePath, directories...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Jason from Uber AI Infra here - we filed the ticket) What will be passed in as directories?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directories contains the container paths of parent directories for any CUDA user-mode driver libraries that are mounted from the host. Note that at present the contianer paths of the libraries are the same as the host path.

On an ubuntu based system the list is typically:

/usr/lib/${ARCH_SPECIFIC_LIB_DIR}/
/usr/lib/${ARCH_SPECIFIC_LIB_DIR}/vdpau

which results in the following CDI hook:

        - hookName: createContainer
          path: /usr/bin/nvidia-cdi-hook
          args:
            - nvidia-cdi-hook
            - update-ldcache
            - --folder
            - /lib/x86_64-linux-gnu
            - --folder
            - /lib/x86_64-linux-gnu/vdpau
          env:
            - NVIDIA_CTK_DEBUG=false

If you wanted to know what the list is on your system, you could run the nvidia-ctk cdi generate command and check the output for the generated update-ldcache hook.

@elezar
Copy link
Member Author

elezar commented Nov 6, 2025

@jasonzlai do you have a simple reproducer Dockerfile that we could add to our test suite to address this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants