Skip to content

YuanhengZ/MultiCUDA-Env-on-Ubuntu2404

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Multiple CUDA Environments on Ubuntu 24.04

This guide is adapted from garg-aayush's tutorial and updated for Ubuntu 24.04. Note: Sections marked with ö indicate content that has been directly adapted from the original tutorial with minimal modifications.

Before We Start

This article mainly focuses on installing multiple versions of CUDA and CUDNN, and how to manage different CUDA versions using Environment Module. However, before diving into the main content, please consider whether your needs can be met through Docker or Conda. In my case, I needed to use a specific version of PyTorch, and using their provided docker was much more convenient than installing CUDA myself. I decided to continue using Environment Modules to manage CUDA simply because I don't like leaving things unfinished, and Docker and Environment Modules are not mutually exclusive anyway. If you really need to use environment modules to manage CUDA, please prepare a backup and restore solution before starting, such as timeshift. Creating backups saves you from having to reinstall the system if it crashes.

Install Nvidia Driver Ö

Add PPA GPU Drivers Repository to the System

sudo add-apt-repository ppa:graphics-drivers/ppa

Check GPU and available drives

ubuntu-drivers devices

Install the compatible driver

best to allow Ubuntu to autodetect and install the compatible nvidia-driver

sudo ubuntu-drivers install

Note: Please restart your system after installing the nvidia driver. Ideally, you should be able to get GPU state and stats using nvidia-smi

Check the installed NVIDIA driver

nvidia-detector 

Install CUDA

This part mainly follows the installation method from the official website. Visit https://developer.nvidia.com/cuda-toolkit-archive to select your desired CUDA version. After clicking on it, you can see the official guide. Taking 11.8 as an example, after selecting your architecture and installer type, you'll see the installation instructions:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

However, if you are using 24.04, following this guide directly will lead to two issues.

libtinfo5

The first issue is that CUDA 11.8 depends on the libtinfo5 library. If you try to install directly, you'll encounter the following error message:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nsight-systems-2023.3.3 : Depends: libtinfo5 but it is not installable
E: Unable to correct problems, you have held broken packages.

The solution can be found here:

Modify the source list by adding the older version repository:

sudo nano /etc/apt/sources.list.d/ubuntu.sources

Append to the end of the file:

Types: deb
URIs: http://old-releases.ubuntu.com/ubuntu/
Suites: lunar
Components: universe
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg

Kernel Mismatch

Even after libtinfo5 is available, installation issues may still occur. I am using Ubuntu 24.04 with kernel 6.14.0-27-generic. When installing the meta-package cuda from a CUDA 11.8 local repo (built for Ubuntu 22.04), apt will try to remove the previously installed 575 driver and install the corresponding 520 driver. However, the 520 driver does not support my current kernel version, resulting in the error Error! Bad return status for module build on kernel: 6.14.0-27-generic.

The solution is simple - since installing the cuda meta-package will bring in the drivers, we can just install the toolkit directly. So for the last step in the official guide, we change sudo apt-get -y install cuda to sudo apt-get install cuda-toolkit-11-8.

For the same reason, when installing other CUDA versions, follow the main commands from the official documentation, just replace cuda with cuda-toolkit-xx-x in the final step.

After the installation, you should be able to see the corresponding CUDA version directory under /usr/local/.

Install CUDNN

Download the CUDNN version corresponding to your CUDA from https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/. Note that newer versions are listed at the bottom. I recommend downloading the latest version.

After downloading the corresponding package using wget, use tar to decompress:

tar -xvf cudnn-<***>.tar.xz 

Then, simply copy the corresponding folders to the cuda toolkit directory:

cd cudnn-<***>-archive/
sudo cp include/cudnn*.h /usr/local/cuda-**.*/include
sudo cp lib/libcudnn* /usr/local/cuda-**.*/lib64

Note: In newer versions of cuDNN, the location of libcudnn has changed from lib64 to lib. The exact path may vary, so please check using the ls command.

Then give these files executable permissions:

sudo chmod a+r /usr/local/cuda-**.*/include/cudnn*.h /usr/local/cuda-**.*/lib64/libcudnn*

I find this manual installation method a bit odd, but I haven't found a better solution yet. Still waiting for guidance from experts.

Using Environment Module

First, install environment-modules:

sudo apt-get update
sudo apt-get install environment-modules

Check it

module avail # check for available modueles
# module list checks for loaded modules, don't confuse them

should be able to see something like

 dot  module-git  module-info  modules  null  use.own

Create CUDA Module Files Ö

The module names shown above can actually be found in the /usr/share/modules/modulefiles/ directory. Here we'll create a cuda folder to store our modules.

Create module files corresponding to CUDA versions:

sudo vim /usr/share/modules/modulefiles/cuda/**.* # for example 11.8

Using 11.8 as an example:

#%Module1.0
##
## cuda 11.8 modulefile
##

proc ModulesHelp { } {
    global version
    
    puts stderr "\tSets up environment for CUDA $version\n"
}

module-whatis "sets up environment for CUDA 11.8"

if { [ is-loaded cuda/12.1 ] } {
module unload cuda/12.1
}

set version 11.8
set root /usr/local/cuda-11.8
setenv CUDA_HOME	$root

prepend-path PATH $root/bin
prepend-path LD_LIBRARY_PATH $root/extras/CUPTI/lib64
prepend-path LD_LIBRARY_PATH $root/lib64
conflict cuda

We can see there's an if statement that automatically unloads the CUDA 12.1 environment if it's loaded. I think this part can be removed since there's already a conflict cuda declaration below that handles conflicts. As long as we remember to unload the current environment before loading another one, it should be fine. Otherwise, if we have three or more environments, do we need to add an if statement in each environment file?

After you have defined module files for each CUDA version, you can use module avail to check what modules are currently available, use module load cuda/**** to load a specific module, and use module unload cuda/*** or simply module unload cuda to clear the environment.

After loading the corresponding environment, you can use nvcc --version to check the CUDA version.

Although for me, success is only achieved after verifying the corresponding PyTorch version works, I haven't reached that step yet. If I encounter any difficulties along the way, I will update this tutorial. That being said, I might just give up and use Docker instead.

About

Documenting my experience on dealing with multi-version cuda on same machine using Environment Modules.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published