Skip to content

Commit 2d86f58

Browse files
Merge pull request #27 from prehensilecode/update-2-3-1
Update for AlphaFold 2.3.1
2 parents 89e3cff + 8f52ba6 commit 2d86f58

5 files changed

+78
-35
lines changed

README.md

+31-3
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ A prebuilt image is hosted on cloud.sylabs.io: [https://cloud.sylabs.io/library/
2525
N.B. The AlphaFold version and the alphafold_singularity versions must match.
2626

2727
```
28-
$ export ALPHAFOLD_VERSION=2.2.4
28+
$ export ALPHAFOLD_VERSION=2.3.1
2929
$ wget https://github.com/deepmind/alphafold/archive/refs/tags/v${ALPHAFOLD_VERSION}.tar.gz -O alphafold-${ALPHAFOLD_VERSION}.tar.gz
3030
...
3131
2023-02-08 17:28:50 (1.24 MB/s) - ‘alphafold-x.x.x.tar.gz’ saved [5855095]
@@ -55,7 +55,18 @@ If your `/tmp` directory is small, you may need to set the [`SINGULARITY_TMPDIR`
5555
environment variable](https://sylabs.io/guides/3.3/user-guide/build_env.html#temporary-folders) to a directory on a filesystem with more free space.
5656
My builds have consumed up to 15 GiB of space. The resulting image file may be up to 10 GiB.
5757

58-
### Install and run
58+
### Download genetic databases
59+
See [AlphaFold 2.3.1 README](https://github.com/deepmind/alphafold/tree/v2.3.1)
60+
for instructions on downloading genetic databases. These are necessary
61+
to run AlphaFold.
62+
63+
This step requires [aria2c](https://aria2.github.io/).
64+
65+
N.B. The difference between downloading the "reduced databases" as opposed
66+
to the "full databases" is that the reduced databases download "small BFD"
67+
instead of "BFD".
68+
69+
### Modify run script, install, and run
5970
To run, modify the `$ALPHAFOLD_SRC/singularity/run_singularity.py` and change the
6071
section marked `USER CONFIGURATION`. At the least, you will need to modify the values
6172
of:
@@ -68,5 +79,22 @@ E.g.
6879
singularity_image = Client.load(os.path.join(os.environ['ALPHAFOLD_DIR'], 'alphafold.sif'))
6980
```
7081

82+
## Running on an HPC cluster
83+
Currently, this project only supports Slurm. Please open an issue to request
84+
support for other job schedulers/resource managers.
85+
86+
7187
### Run as a Slurm job on a cluster
72-
See the example job script [`example_slurm_job.sh`](https://github.com/prehensilecode/alphafold_singularity/blob/main/example_slurm_job.sh)
88+
See the example job script [`example_slurm_job.sh`](https://github.com/prehensilecode/alphafold_singularity/blob/main/example_slurm_job.sh).
89+
N.B. this example must be modified to suit your specific HPC environment.
90+
91+
The `run_singularity.py` script will use all GPUs available to the job. If
92+
Slurm has been set up with [`cgroups`](https://en.wikipedia.org/wiki/Cgroups),
93+
the job may request fewer than the total number of GPUs installed on a node.
94+
E.g. if the GPU nodes in the cluster have 4 GPU devices each, the job can
95+
do
96+
```bash
97+
#SBATCH --gpus=2
98+
```
99+
and AlphaFold Singularity will use only two of the four GPUs. This is
100+
because the `cgroup` for the job only shows 2 GPUs to the job.

Singularity.def

+9-8
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ Stage: spython-base
2222
# FROM directive resets ARGS, so we specify again (the value is retained if
2323
# previously set).
2424

25-
apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
25+
apt-get update \
26+
&& DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
2627
build-essential \
2728
cmake \
2829
cuda-command-line-tools-11-1 \
@@ -48,9 +49,9 @@ wget \
4849

4950
# Install Miniconda package manager.
5051
wget -q -P /tmp \
51-
https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh \
52-
&& bash /tmp/Miniconda3-py37_4.12.0-Linux-x86_64.sh -b -p /opt/conda \
53-
&& rm /tmp/Miniconda3-py37_4.12.0-Linux-x86_64.sh
52+
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
53+
&& bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
54+
&& rm /tmp/Miniconda3-latest-Linux-x86_64.sh
5455

5556
# Install conda packages.
5657
PATH="/opt/conda/bin:/usr/local/cuda-11.1/bin:$PATH"
@@ -60,7 +61,7 @@ openmm=7.5.1 \
6061
cudatoolkit==11.1.1 \
6162
pdbfixer \
6263
pip \
63-
python=3.7 \
64+
python=3.8 \
6465
&& conda clean --all --force-pkgs-dirs --yes
6566

6667
### /bin/cp -r . /app/alphafold
@@ -73,12 +74,12 @@ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c49412
7374
pip3 install --upgrade pip --no-cache-dir \
7475
&& pip3 install -r /app/alphafold/requirements.txt --no-cache-dir \
7576
&& pip3 install --upgrade --no-cache-dir \
76-
jax==0.3.17 \
77-
jaxlib==0.3.15+cuda11.cudnn805 \
77+
jax==0.3.25 \
78+
jaxlib==0.3.25+cuda11.cudnn805 \
7879
-f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
7980

8081
# Apply OpenMM patch.
81-
cd /opt/conda/lib/python3.7/site-packages
82+
cd /opt/conda/lib/python3.8/site-packages
8283
patch -p0 < /app/alphafold/docker/openmm.patch
8384

8485
# Add SETUID bit to the ldconfig binary so that non-root users can run it.

example_slurm_job.sh

+9-12
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,41 @@
11
#!/bin/bash
2-
#SBATCH -p gpu
2+
#SBATCH --partition=gpu
33
#SBATCH --time=18:00:00
44
#SBATCH --gpus=4
55
#SBATCH --cpus-per-gpu=12
6-
#SBATCH --mem=140G
6+
#SBATCH --mem=45G
77

88
### NOTE
99
### This job script cannot be used without modification for your specific environment.
1010

11-
module load alphafold/2.2.4
12-
module load python/gcc/3.10
11+
module load python/gcc/3.11
12+
module load alphafold/2.3.1
1313

1414
### Check values of some environment variables
15-
echo SLURM_JOB_GPUS=$SLURM_JOB_GPUS
1615
echo ALPHAFOLD_DIR=$ALPHAFOLD_DIR
1716
echo ALPHAFOLD_DATADIR=$ALPHAFOLD_DATADIR
1817

1918
###
20-
### README This runs AlphaFold 2.2.2 on the T1050.fasta file
19+
### README This runs AlphaFold 2.3.1 on the T1050.fasta file
2120
###
2221

2322
# AlphaFold should use all GPU devices available to the job by default.
24-
# To explicitly specify use of GPUs, and the GPU devices to use, add
25-
# --use_gpu --gpu_devices=${SLURM_JOB_GPUS}
2623
#
2724
# To run the CASP14 evaluation, use:
2825
# --model_preset=monomer_casp14
26+
# --db_preset=full_dbs (or delete the line; default is "full_dbs")
2927
#
3028
# To benchmark, running multiple JAX model evaluations (NB this
3129
# significantly increases run time):
3230
# --benchmark
3331

34-
# Run AlphaFold; default is to use GPUs, i.e. "--use_gpu" can be omitted.
32+
# Run AlphaFold; default is to use GPUs
3533
python3 ${ALPHAFOLD_DIR}/singularity/run_singularity.py \
36-
--use_gpu --gpu_devices=${SLURM_JOB_GPUS} \
3734
--data_dir=${ALPHAFOLD_DATADIR} \
3835
--fasta_paths=T1050.fasta \
3936
--max_template_date=2020-05-14 \
40-
--model_preset=monomer_casp14 \
41-
--benchmark
37+
--db_preset=reduced_dbs \
38+
--model_preset=monomer
4239

4340
echo INFO: AlphaFold returned $?
4441

requirements.txt

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
# Dependencies necessary to execute run_singularity.py
2-
absl-py==0.13.0
3-
spython==0.1.16
2+
# absl-py version to match deepmind/alphafold
3+
absl-py==1.0.0
4+
spython==0.3.0

run_singularity.py

+26-10
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,19 @@
1515
"""Singularity launch script for Alphafold Singularity image."""
1616

1717
import os
18+
import sys
1819
import pathlib
1920
import signal
2021
from typing import Tuple
2122

2223
from absl import app
2324
from absl import flags
2425
from absl import logging
26+
from spython.main import Client
2527

2628
import tempfile
27-
from spython.main import Client
29+
import subprocess
30+
2831

2932
#### USER CONFIGURATION ####
3033

@@ -34,11 +37,16 @@
3437
singularity_image = Client.load(os.path.join(os.environ['ALPHAFOLD_DIR'], 'alphafold.sif'))
3538

3639
# Path to a directory that will store the results.
37-
if 'TMPDIR' in os.environ:
40+
if 'TMP' in os.environ:
41+
output_dir = os.environ['TMP']
42+
elif 'TMPDIR' in os.environ:
3843
output_dir = os.environ['TMPDIR']
3944
else:
4045
output_dir = tempfile.mkdtemp(dir='/tmp', prefix='alphafold-')
4146

47+
# set tmp dir the same as output dir
48+
tmp_dir = output_dir
49+
4250
#### END USER CONFIGURATION ####
4351

4452

@@ -62,7 +70,7 @@
6270
'separated by commas. All FASTA paths must have a unique basename as the '
6371
'basename is used to name the output directories for each prediction.')
6472
flags.DEFINE_string(
65-
'output_dir', '/tmp/alphafold',
73+
'output_dir', output_dir,
6674
'Path to a directory that will store the results.')
6775
flags.DEFINE_string(
6876
'data_dir', None,
@@ -113,6 +121,7 @@
113121

114122

115123
def _create_bind(bind_name: str, path: str) -> Tuple[str, str]:
124+
"""Create a bind point for each file and directory used by the model."""
116125
path = os.path.abspath(path)
117126
source_path = os.path.dirname(path) if bind_name != 'data_dir' else path
118127
target_path = os.path.join(_ROOT_MOUNT_DIRECTORY, bind_name)
@@ -145,7 +154,7 @@ def main(argv):
145154

146155
# Path to the MGnify database for use by JackHMMER.
147156
mgnify_database_path = os.path.join(
148-
FLAGS.data_dir, 'mgnify', 'mgy_clusters_2018_12.fa')
157+
FLAGS.data_dir, 'mgnify', 'mgy_clusters_2022_05.fa')
149158

150159
# Path to the BFD database for use by HHblits.
151160
bfd_database_path = os.path.join(
@@ -156,9 +165,9 @@ def main(argv):
156165
small_bfd_database_path = os.path.join(
157166
FLAGS.data_dir, 'small_bfd', 'bfd-first_non_consensus_sequences.fasta')
158167

159-
# Path to the Uniclust30 database for use by HHblits.
160-
uniclust30_database_path = os.path.join(
161-
FLAGS.data_dir, 'uniclust30', 'uniclust30_2018_08', 'uniclust30_2018_08')
168+
# Path to the Uniref30 database for use by HHblits.
169+
uniref30_database_path = os.path.join(
170+
FLAGS.data_dir, 'uniref30', 'UniRef30_2021_03')
162171

163172
# Path to the PDB70 database for use by HHsearch.
164173
pdb70_database_path = os.path.join(FLAGS.data_dir, 'pdb70', 'pdb70')
@@ -178,7 +187,7 @@ def main(argv):
178187
if alphafold_path == data_dir_path or alphafold_path in data_dir_path.parents:
179188
raise app.UsageError(
180189
f'The download directory {FLAGS.data_dir} should not be a subdirectory '
181-
f'in the AlphaFold repository directory. If it is, the Docker build is '
190+
f'in the AlphaFold repository directory. If it is, the Singularity build is '
182191
f'slow since the large databases are copied during the image creation.')
183192

184193
binds = []
@@ -211,7 +220,7 @@ def main(argv):
211220
database_paths.append(('small_bfd_database_path', small_bfd_database_path))
212221
else:
213222
database_paths.extend([
214-
('uniclust30_database_path', uniclust30_database_path),
223+
('uniref30_database_path', uniref30_database_path),
215224
('bfd_database_path', bfd_database_path),
216225
])
217226
for name, path in database_paths:
@@ -222,6 +231,11 @@ def main(argv):
222231

223232
output_target_path = os.path.join(_ROOT_MOUNT_DIRECTORY, 'output')
224233
binds.append(f'{output_dir}:{output_target_path}')
234+
logging.info('Binding %s -> %s', output_dir, output_target_path)
235+
236+
tmp_target_path = '/tmp'
237+
binds.append(f'{tmp_dir}:{tmp_target_path}')
238+
logging.info('Binding %s -> %s', tmp_dir, tmp_target_path)
225239

226240
use_gpu_relax = FLAGS.enable_gpu_relax and FLAGS.use_gpu
227241

@@ -240,9 +254,11 @@ def main(argv):
240254

241255
options = [
242256
'--bind', f'{",".join(binds)}',
257+
'--env', f'NVIDIA_VISIBLE_DEVICES={FLAGS.gpu_devices}',
258+
# The following flags allow us to make predictions on proteins that
259+
# would typically be too long to fit into GPU memory.
243260
'--env', 'TF_FORCE_UNIFIED_MEMORY=1',
244261
'--env', 'XLA_PYTHON_CLIENT_MEM_FRACTION=4.0',
245-
'--env', 'OPENMM_CPU_THREADS=12'
246262
]
247263

248264
# Run the container.

0 commit comments

Comments
 (0)