Skip to content

High Performance Computing Lore

Joseph D. Long edited this page Jan 11, 2023 · 5 revisions

Shortcuts, tips, tricks, etc. for using University of Arizona high performance computing clusters (and more).

conda in the cluster

Conda is the best way to install darn near everything (not just Python) when you don't have root access to a system (like HPC). To use conda, install miniconda (or Anaconda) with its command-line installer. Note that you should change the default install path, because our home directory quota is a pathetic 50 GB. Probably your PI has some share with more space.

There are a bunch of different flavors of conda but mambaforge is a miniconda-like package incorporates a much faster mamba install command that is a replacement for conda install. Recommended.

Installing mambaforge

  1. Connect to Puma. See the next section for shortcuts, or ssh to hpc.arizona.edu and then type puma to get to Puma.
  2. Download the installer: curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
  3. Use uquota to identify a good place to install, with plenty of free space. Example output:
$ uquota
                                            used  soft limit  hard limit
/groups/jrmales                           252.1G      500.0G      500.0G
/home                                       8.6G       50.0G       50.0G
/rental/douglase                             0.0        5.0T        5.0T
/rental/jrmales                              0.0       50.0T       50.0T
  1. (optional) Make a personal directory to keep things tidy. E.g. mkdir -p /groups/jrmales/josephlong
  2. Run the installer. Use -b for "batch mode" to indicate you agree with the software license, and -p to indicate you want to install in your preferred location. Ex: bash Mambaforge-$(uname)-$(uname -m).sh -b -p /groups/jrmales/josephlong/mambaforge

Improve your SSH experience

Most OSes with ssh now support ssh-copy-id [email protected] to install your public key for passwordless (and Duo-less) authentication to the HPC bastion host.

The HPC submit hosts / login nodes / whatever you want to call them share your home directory, so you can add your ~/.ssh/id_ecdsa.pub contents to the end of ~/.ssh/authorized_keys after logging in.

You can use ProxyJump in ~/.ssh/config, meaning you can SSH "directly" to the Puma login node by editing the configuration file to look something like this:

Host hpc hpc.arizona.edu
    HostName hpc.arizona.edu
    User YOURUSERNAME
Host puma shell.hpc.arizona.edu
    HostName shell.hpc.arizona.edu
    User YOURUSERNAME
    ProxyJump hpc
Host ocelote login2.ocelote.hpc.arizona.edu
    HostName login2.ocelote.hpc.arizona.edu
    User YOURUSERNAME
    ProxyJump hpc

Once the keys and config are in place, ssh puma will go to the Puma submit node in one step.

Proxy quickly to a running job

This basically works like the other doc says, but one extra detail: you need to proxy through the HPC login node with -J puma to get to where your job is running. So, if you're running a Jupyter Lab instance in an HPC job, you can get the hostname of the node running the job (e.g. by squeue -u $USER) and start an ssh tunnel with ssh -L 9000:localhost:9000 -J puma r1u2n1 (where r1u2n1 is the hostname of the node where your process is running).

Singularity containers

Now that all the clusters are more or less in sync, OS version wise, containers are the best way to reproduce an environment you construct locally. Using Docker (or Docker for Mac), you can specify a list of steps (a Dockerfile) to construct a container, build that, and push it to DockerHub. Docker doesn't run on UA HPC, but this is the easiest way to get the image converted to Singularity (which does).

Example: PyTorch container

To get NVIDIA's PyTorch container, use singularity pull docker://nvcr.io/nvidia/pytorch:21.06-py3. Use singularity shell --nv pytorch_latest.simg to start a shell with the contents of that container available. The --nv makes the GPUs in your job available within the container.

Shortcut to convert a container unattended (i.e. without waiting for an interactive session)

I put this in my .profile and get a d2s command I can use to enqueue a batch job to convert the Docker image to Singularity. Note you need a ~/devel/simgs folder pre-made, and it should ideally be linked somewhere you have a lot of storage.

Example: d2s xwcl/milk-carton converts xwcl/milk-carton into a SIF file at ~/devel/simgs/milk-carton_latest.sif.

function d2s() {
    pushd ~/devel/simgs/
    sbatch <<EOF
#!/usr/bin/bash
#SBATCH --job-name=docker_to_singularity
#SBATCH [email protected]
#SBATCH --mail-type=END,FAIL
#SBATCH --time=4:00:00
#SBATCH --partition=standard
#SBATCH --account=YOURPIACCOUNT
singularity pull --disable-cache --force docker://${1}
EOF
    popd
}

Clone this wiki locally