-
Notifications
You must be signed in to change notification settings - Fork 5
High Performance Computing Lore
Shortcuts, tips, tricks, etc. for using University of Arizona high performance computing clusters (and more).
Conda is the best way to install darn near everything (not just Python) when you don't have root access to a system (like HPC). To use conda, install miniconda (or Anaconda) with its command-line installer. Note that you should change the default install path, because our home directory quota is a pathetic 50 GB. Probably your PI has some share with more space.
There are a bunch of different flavors of conda but mambaforge is a miniconda-like package incorporates a much faster mamba install command that is a replacement for conda install. Recommended.
- Connect to Puma. See the next section for shortcuts, or ssh to
hpc.arizona.eduand then typepumato get to Puma. - Download the installer:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" - Use
uquotato identify a good place to install, with plenty of free space. Example output:
$ uquota
used soft limit hard limit
/groups/jrmales 252.1G 500.0G 500.0G
/home 8.6G 50.0G 50.0G
/rental/douglase 0.0 5.0T 5.0T
/rental/jrmales 0.0 50.0T 50.0T
- (optional) Make a personal directory to keep things tidy. E.g.
mkdir -p /groups/jrmales/josephlong - Run the installer. Use
-bfor "batch mode" to indicate you agree with the software license, and-pto indicate you want to install in your preferred location. Ex:bash Mambaforge-$(uname)-$(uname -m).sh -b -p /groups/jrmales/josephlong/mambaforge
Most OSes with ssh now support ssh-copy-id [email protected] to install your public key for passwordless (and Duo-less) authentication to the HPC bastion host.
The HPC submit hosts / login nodes / whatever you want to call them share your home directory, so you can add your ~/.ssh/id_ecdsa.pub contents to the end of ~/.ssh/authorized_keys after logging in.
You can use ProxyJump in ~/.ssh/config, meaning you can SSH "directly" to the Puma login node by editing the configuration file to look something like this:
Host hpc hpc.arizona.edu
HostName hpc.arizona.edu
User YOURUSERNAME
Host puma shell.hpc.arizona.edu
HostName shell.hpc.arizona.edu
User YOURUSERNAME
ProxyJump hpc
Host ocelote login2.ocelote.hpc.arizona.edu
HostName login2.ocelote.hpc.arizona.edu
User YOURUSERNAME
ProxyJump hpc
Once the keys and config are in place, ssh puma will go to the Puma submit node in one step.
This basically works like the other doc says, but one extra detail: you need to proxy through the HPC login node with -J puma to get to where your job is running. So, if you're running a Jupyter Lab instance in an HPC job, you can get the hostname of the node running the job (e.g. by squeue -u $USER) and start an ssh tunnel with ssh -L 9000:localhost:9000 -J puma r1u2n1 (where r1u2n1 is the hostname of the node where your process is running).
Now that all the clusters are more or less in sync, OS version wise, containers are the best way to reproduce an environment you construct locally. Using Docker (or Docker for Mac), you can specify a list of steps (a Dockerfile) to construct a container, build that, and push it to DockerHub. Docker doesn't run on UA HPC, but this is the easiest way to get the image converted to Singularity (which does).
To get NVIDIA's PyTorch container, use singularity pull docker://nvcr.io/nvidia/pytorch:21.06-py3. Use singularity shell --nv pytorch_latest.simg to start a shell with the contents of that container available. The --nv makes the GPUs in your job available within the container.
I put this in my .profile and get a d2s command I can use to enqueue a batch job to convert the Docker image to Singularity. Note you need a ~/devel/simgs folder pre-made, and it should ideally be linked somewhere you have a lot of storage.
Example: d2s xwcl/milk-carton converts xwcl/milk-carton into a SIF file at ~/devel/simgs/milk-carton_latest.sif.
function d2s() {
pushd ~/devel/simgs/
sbatch <<EOF
#!/usr/bin/bash
#SBATCH --job-name=docker_to_singularity
#SBATCH [email protected]
#SBATCH --mail-type=END,FAIL
#SBATCH --time=4:00:00
#SBATCH --partition=standard
#SBATCH --account=YOURPIACCOUNT
singularity pull --disable-cache --force docker://${1}
EOF
popd
}