Skip to content

Latest commit

 

History

History

Python_Conda_Basics

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Python: Conda Basics

In high-performance computing, Python is heavily used to analyze scientific data on the system. Various Python installations and scientific packages need to be installed to analyze data for our users. These Python installations can become difficult to manage on an HPC system as the programming environment is complicated. Conda, a package and virtual environment manager from the Anaconda distribution, helps alleviate these issues. Miniforge is an open source version of Miniconda, which is what the OLCF crash course will use to be able to utilize conda environments.

Conda allows users to easily install different versions of binary software packages and any required libraries appropriate for their computing platform. The versatility of conda allows a user to essentially build their own isolated Python environment, without having to worry about clashing dependencies and other system installations of Python.

This hands-on challenge will introduce a user to installing Conda on Frontier, the basic workflow of using conda environments, as well as providing an example of how to create a conda environment that uses a different version of Python than the base environment uses on Frontier.

 

Inspecting and setting up the environment

First, we will unload all the current modules that you may have previously loaded on Frontier:

$ module reset

Next, we need to load the gnu compiler module (most Python packages assume use of GCC), and the miniforge module (allows us to create conda environments):

$ module load PrgEnv-gnu/8.5.0
$ module load miniforge3

This puts you in the "base" conda environment (your base-level install that came with a few packages). Typical best practice is to not install new things into the base environment, but to create new environments instead. So, next, we will create a new environment using the conda create command:

$ conda create -p /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier/py39-frontier python=3.9

The "-p" flag specifies the desired path and name of your new virtual environment. The directory structure is case sensitive, so be sure to insert "<YOUR_PROJECT_ID>" as lowercase. Directories will be created if they do not exist already (provided you have write-access in that location). Instead, one can solely use the --name <your_env_name> flag which will automatically use your $HOME directory.


NOTE: It is highly recommended to create new environments in the "Project Home" directory (on Frontier, this is /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>). This space avoids purges and allows for potential collaboration within your project. It is also recommended, for convenience, that you use environment names that indicate the hostname, as virtual environments created on one system will not necessarily work on others.


After executing the conda create command, you will be prompted to install "the following NEW packages" -- type "y" then hit Enter/Return. Downloads of the fresh packages will start and eventually you should see something similar to:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier/py39-frontier
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Due to the specific nature of conda on Frontier, we will be using source activate and source deactivate instead of conda activate and conda deactivate. Let's activate our new environment:

$ source activate /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier/py39-frontier

The path to the environment should now be displayed in "( )" at the beginning of your terminal lines, which indicate that you are currently using that specific conda environment. And if you check with conda env list again, you should see that the * marker has moved to your newly activated environment:

$ conda env list

# conda environments:
#
                      *  /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier/py39-frontier
base                     /autofs/nccs-svm1_sw/odo/miniforge3/23.11.0

 

Installing packages

Next, let's install a package (NumPy). There are a few different approaches.

One way to install packages into your conda environment is to use pip. Although pip can install pre-compiled binaries like conda, it can also be used to build packages from source. This approach is useful if a specific package or package version is not available in the conda repository, or if the pre-compiled binaries don't work on the HPC resources (which is common). Pip is available to use after installing Python into your conda environment, which we have already done.


NOTE: Because issues can arise when using conda and pip together (see link in Additional Resources Section), it is recommended to do this only if absolutely necessary. However, as long as you are careful about it, things will probably be fine.


Building from source means you need to take care of some of the dependencies yourself, especially for optimization. In Frontier's case, this means we need to load the openblas module. To build a package from source, use pip install --no-binary=<package_name> <package_name>:

$ module load openblas
$ CC=gcc CXX=g++ pip install --no-binary=numpy numpy --no-cache-dir

The CC=gcc flag will ensure that we are using the proper compiler and wrapper. Building from source results in a longer installation time for packages, so you may need to wait a few minutes for the install to finish. The no-cache-dir flag makes sure that no previously built packages that may exist in your cache are used.

After it is finished building, you should see something similar to:

Successfully built numpy
Installing collected packages: numpy
Successfully installed numpy-1.26.1

Congratulations, you have built NumPy from source in your conda environment!

Now, let's install using a different method instead, but first we must uninstall the pip-installed NumPy:

$ pip uninstall numpy
$ module unload openblas

The traditional, and more basic, approach to installing/uninstalling packages into a conda environment is to use the commands conda install and conda remove. In "regular" Andaconda, installing packages with this method checks the Anaconda Distribution Repository for pre-built binary packages to install. However, because we are using Miniforge, installing packages checks the Conda-forge repository ("channel") instead. Let's do this to install NumPy:

$ conda install numpy

Conda handles dependencies when installing pre-built binaries, so it will automatically install all of the packages NumPy needs for optimization.

Congratulations, you have just installed NumPy, now let's test it!

 

Testing your new environment

Let's run a small script to test that things installed properly. Since we are running a small test, we can do this without having to run on a compute node.


NOTE: Remember, at larger scales both your performance and your fellow users' performance will suffer if you do not run on the compute nodes.


It is always highly recommended to run on the compute nodes (through the use of a batch job or interactive batch job).

Make sure you're in the correct directory and execute the example Python script:

$ cd ~/hands-on-with-frontier/challenges/Python_Conda_Basics/
$ python3 hello.py

Hello from Python 3.9.18!
You are using NumPy 1.26.0

Congratulations, you have just created your own Python environment and ran on one of the fastest computers in the world!


Note: If you're doing this challenge for the certificate, you can submit your Python environment for completion. See "Exporting (sharing) an environment" tip below of how to export your environment to a file.


 

Additional Tips

  • Cloning an environment:

    It is not recommended to try to install new packages into the base environment. Instead, you can clone the base environment for yourself and install packages into the clone. To clone an environment, you must use the --clone <env_to_clone> flag when creating a new conda environment. An example for cloning the base environment into your $HOME directory on Frontier is provided below:

    $ conda create -p /ccs/home/<YOUR_USER_ID>/.conda/envs/baseclone-frontier --clone base
    $ source activate /ccs/home/<YOUR_USER_ID>/.conda/envs/baseclone-frontier
  • Deleting an environment:

    If for some reason you need to delete an environment, you can execute the following:

    $ conda env remove -p /path/to/your/env
  • Exporting (sharing) an environment:

    You may want to share your environment with someone else. As mentioned previously, one way to do this is by creating your environment in a shared location where other users can access it. A different way (the method described below) is to export a list of all the packages and versions of your environment (an environment.yml file). If a different user provides conda the list you made, conda will install all the same package versions and recreate your environment for them -- essentially "sharing" your environment. To export your environment list:

    $ source activate my_env
    $ conda env export > environment.yml

    You can then email or otherwise provide the environment.yml file to the desired person. The person would then be able to create the environment like so:

    $ conda env create -f environment.yml
  • Adding known environment locations:

    For a conda environment to be callable by a "name", it must be installed in one of the envs_dirs directories. The list of known directories can be seen by executing:

    $ conda config --show envs_dirs

    On Frontier, the default location is your $HOME directory. If you plan to frequently create environments in a different location than the default (such as /ccs/proj/...), then there is an option to add directories to the envs_dirs list. To do so, you must execute:

    $ conda config --append envs_dirs /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier

    This will create a .condarc file in your $HOME directory if you do not have one already, which will now contain this new envs_dirs location. This will now enable you to use the --name env_name flag when using conda commands for environments stored in that specific directory, instead of having to use the -p /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier/env_name flag and specifying the full path to the environment. For example, you can do source activate py3711-frontier instead of source activate /ccs/proj/<YOUR_PROJECT_ID>/<YOUR_USER_ID>/conda_envs/frontier/py3711-frontier.

 

Quick-Reference Commands

  • List environments:

    $ conda env list
  • List installed packages in current environment:

    $ conda list
  • Creating an environment with Python version X.Y:

    For a specific path:

    $ conda create -p /path/to/your/my_env python=X.Y

    For a specific name:

    $ conda create -n my_env python=X.Y
  • Deleting an environment:

    For a specific path:

    $ conda env remove -p /path/to/your/my_env

    For a specific name:

    $ conda env remove -n my_env
  • Copying an environment:

    For a specific path:

    $ conda create -p /path/to/new_env --clone old_env

    For a specific name:

    $ conda create -n new_env --clone old_env
  • Activating/Deactivating an environment:

    $ source activate my_env
    $ source deactivate # deactivates the current environment
  • Installing/Uninstalling packages:

    Using conda:

    $ conda install package_name
    $ conda remove package_name

    Using pip:

    $ pip install package_name
    $ pip uninstall package_name
    $ pip install --no-binary=package_name package_name # builds from source

 

Additional Resources