Skip to content

Rouge RPM CUDA package in Jupyter Notebook image #203

@jstourac

Description

@jstourac

This is a copy of https://issues.redhat.com/browse/RHODS-11680.

Description of problem:

When creating the Jupyter Notebook servers from the images which contain CUDA tooling, I can see that apart from expected CUDA version (11.8 and 11.4 respectively at the moment) RPM packages, there is also one package cuda-toolkit-config-common with unexpected version:

2023.1_image_version

(app-root) (app-root) rpm -qa | grep -i cuda
cuda-toolkit-config-common-12.2.128-1.noarch
cuda-toolkit-11-config-common-11.8.89-1.noarch
cuda-toolkit-11-8-config-common-11.8.89-1.noarch
cuda-cudart-11-8-11.8.89-1.x86_64
cuda-compat-11-8-520.61.05-1.x86_64
...

1.2_image_version

(app-root) (app-root) rpm -qa | grep cuda
cuda-toolkit-config-common-11.8.89-1.noarch
cuda-toolkit-11-config-common-11.8.89-1.noarch
cuda-toolkit-11-4-config-common-11.4.148-1.noarch
cuda-cudart-11-4-11.4.108-1.x86_64
cuda-compat-11-4-470.141.10-1.x86_64
...

Prerequisites (if any, like setup, operators/versions):

Tested with images provided with RHODS 1.32 RC6

Steps to Reproduce

  1. Go to RHODS -> applications -> enabled -> launch Jupyter application
  2. Choose any of the images with CUDA and click on Start server button
  3. Wait till the server is created and then open it and run following command in terminal:
rpm -qa | grep cuda

Fortunately presence of this rouge RPM package doesn't break any functionality at the moment, so our main problem here is just eventual user/customer confusion. Still, there may be a risk that content of the rouge package will change in the future and may actually affect the behavior somehow!

Actual results:

Extra cuda RPM packages are present in the Jupyter Notebook images.

Expected results:

Only those cuda RPM packages present in Jupyter Notebook images, that are relevant for particular CUDA version.

Reproducibility (Always/Intermittent/Only Once):

Always

Additional info:

Presence of this rouge RPM package doesn't break any functionality at the moment since it actually installs only directory that in a result is just a symlink to the actual CUDA installation of a proper version, see:

2023.1_image_version

(app-root) (app-root) rpm -ql cuda-toolkit-config-common-12.2.128-1.noarch
/etc/ld.so.conf.d/000_cuda.conf
(app-root) (app-root) rpm -ql cuda-toolkit-11-config-common-11.8.89-1.noarch
/etc/ld.so.conf.d/989_cuda-11.conf
(app-root) (app-root) rpm -ql cuda-toolkit-11-8-config-common-11.8.89-1.noarch
(contains no files)

(app-root) (app-root) cat /etc/ld.so.conf.d/000_cuda.conf
/usr/local/cuda/targets/x86_64-linux/lib
(app-root) (app-root) cat /etc/ld.so.conf.d/989_cuda-11.conf
/usr/local/cuda-11/targets/x86_64-linux/lib

(app-root) (app-root) ls -dl /usr/local/cuda*
lrwxrwxrwx.  1 root root  22 Aug 17 17:39 /usr/local/cuda -> /etc/alternatives/cuda
lrwxrwxrwx.  1 root root  25 Aug 17 17:39 /usr/local/cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x. 13 root root 183 Aug 17 17:43 /usr/local/cuda-11.8

(app-root) (app-root) ls -l /etc/alternatives/cuda 
lrwxrwxrwx. 1 root root 20 Aug 17 17:39 /etc/alternatives/cuda -> /usr/local/cuda-11.8
(app-root) (app-root) ls -l /etc/alternatives/cuda-11
lrwxrwxrwx. 1 root root 20 Aug 17 17:39 /etc/alternatives/cuda-11 -> /usr/local/cuda-11.8

1.2_image_version

(app-root) (app-root) rpm -ql cuda-toolkit-config-common-11.8.89-1.noarch
/etc/ld.so.conf.d/000_cuda.conf
(app-root) (app-root) rpm -ql cuda-toolkit-11-config-common-11.8.89-1.noarch
/etc/ld.so.conf.d/989_cuda-11.conf
(app-root) (app-root) rpm -ql cuda-toolkit-11-4-config-common-11.4.148-1.noarch
(contains no files)

(app-root) (app-root) cat /etc/ld.so.conf.d/000_cuda.conf
/usr/local/cuda/targets/x86_64-linux/lib
(app-root) (app-root) cat /etc/ld.so.conf.d/989_cuda-11.conf
/usr/local/cuda-11/targets/x86_64-linux/lib

(app-root) (app-root) ls -dl /usr/local/cuda*
lrwxrwxrwx.  1 root root  22 Nov 17  2022 /usr/local/cuda -> /etc/alternatives/cuda
lrwxrwxrwx.  1 root root  25 Nov 17  2022 /usr/local/cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x. 13 root root 200 Nov 17  2022 /usr/local/cuda-11.4

(app-root) (app-root) ls -l /usr/local/cuda-11
lrwxrwxrwx. 1 root root 25 Nov 17  2022 /usr/local/cuda-11 -> /etc/alternatives/cuda-11
(app-root) (app-root) ls -l /etc/alternatives/cuda-11
lrwxrwxrwx. 1 root root 20 Nov 17  2022 /etc/alternatives/cuda-11 -> /usr/local/cuda-11.4
(app-root) (app-root) ls -l /etc/alternatives/cuda
lrwxrwxrwx. 1 root root 20 Nov 17  2022 /etc/alternatives/cuda -> /usr/local/cuda-11.4

So until the content of the package is only it is now, we are safe. But this may change in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    Status

    📋 Backlog

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions