Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Nov 3, 2025

This change also allows the nvidia-cdi-refresh.service to execute when the nvidia-current kernel module is present instead of restricting this to the nvidia
module.

Fixes #1395

@elezar elezar added the bug Issue/PR to expose/discuss/fix a bug label Nov 3, 2025
@elezar
Copy link
Member Author

elezar commented Nov 3, 2025

/cherry-pick release-1.18

Environment=NVIDIA_CTK_CDI_OUTPUT_FILE_PATH=/var/run/cdi/nvidia.yaml
EnvironmentFile=-/etc/nvidia-container-toolkit/nvidia-cdi-refresh.env
ExecCondition=/usr/bin/grep -qE '/nvidia.ko' /lib/modules/%v/modules.dep
ExecCondition=/usr/bin/grep -qE '/nvidia.ko|/nvidia-current.ko' /lib/modules/%v/modules.dep
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on #1395 @tr4cks proposes

ExecCondition=/usr/bin/grep -qE '/nvidia(-current)?\.ko' /lib/modules/%v/modules.dep

What do you think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a strong feeling on this -- although I though the explicit OR to be more easily readable instead of trying to parse a more complex regex. It should strictly speaking be:

Suggested change
ExecCondition=/usr/bin/grep -qE '/nvidia.ko|/nvidia-current.ko' /lib/modules/%v/modules.dep
ExecCondition=/usr/bin/grep -qE '/nvidia\.ko|/nvidia-current\.ko' /lib/modules/%v/modules.dep

Anyway.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's more meaningful, especially if you need to add something else in the future.

However, I might have done it this way. Without forgetting the $ at the end if you don't want it to match nividia.ko.someting-else. But maybe that's what you want, I don't know, I don't have the context.

ExecCondition=/usr/bin/grep -qE '/(nvidia|nvidia-current)\.ko$' /lib/modules/%v/modules.dep

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one system I checked, I don't think that the $ will do what we wanted:

$ cat modules.dep | grep -E '/nvidia\.ko\:'
updates/dkms/nvidia.ko:

Note the trailing .ko:.

I think what we may want is:

$ cat /lib/modules/6.11.0-1013-nvidia-64k/modules.dep | grep -E '/(nvidia|nvidia-current)\.ko[:]'
updates/dkms/nvidia.ko:

Copy link
Member Author

@elezar elezar Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually: looking at https://man7.org/linux/man-pages/man5/modules.dep.5.html it is stated that:

These files are not intended for editing or use by any additional
utilities as their format is subject to change in the future. You
should use the modinfo(8) command to obtain information about
modules in a future proof and compatible fashion rather than
touching these files.

Does modinfo nvidia --field version work as expected on your system where the module file is nvidia-current.ko?

I get:

$ modinfo nvidia --field version
580.82.07

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't work.

$ modinfo nvidia --field version
modinfo: ERROR: Module nvidia not found.

Here are the available modules:

$ modinfo nvidia-
nvidia-current           nvidia-current-modeset   nvidia-current-uvm
nvidia-current-drm       nvidia-current-peermem   nvidia-wmi-ec-backlight
$ modinfo nvidia-

So forced to do this with nvidia-current:

$ modinfo nvidia-current --field version
535.247.01

In addition, the executable is located in the /usr/sbin/modinfo path, and I am not sure if it is always in the same place on all systems—to be verified.

$ whereis modinfo
modinfo: /usr/sbin/modinfo /usr/share/man/man8/modinfo.8.gz

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some research, I think your method is probably the best compromise. It would also be possible to search for modules in the file system, but that would require executing more complicated commands with pipes.

It would probably be necessary to iterate and track the changes.

@elezar elezar force-pushed the fix-exec-start-condition branch from b1a5376 to 9bc36a0 Compare November 4, 2025 15:26
This change also allows the nvidia-cdi-refresh.service
to execute when the nvidia-current kernel module is
present instead of restricting this to the nvidia
module.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the fix-exec-start-condition branch from 9bc36a0 to 1f5ce73 Compare November 5, 2025 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue/PR to expose/discuss/fix a bug cherry-pick/release-1.18

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stopped at an ExecCondition to run nvidia-cdi-refresh.service

3 participants