Skip to content

Commit 1ffd197

Browse files
authored
Merge pull request #533 from jnwei/pl_upgrades
Update pl_upgrades to use numpy 2 and update other dependencies
2 parents 23cf2f6 + 620a54f commit 1ffd197

23 files changed

Lines changed: 488 additions & 142 deletions

.github/workflows/docker-image.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
runs-on: ubuntu-latest
1212
steps:
1313
- uses: actions/checkout@v4
14-
- name: Cleanup
14+
- name: Cleanup # https://github.com/actions/virtual-environments/issues/2840
1515
run: sudo rm -rf /usr/share/dotnet && sudo rm -rf /opt/ghc && sudo rm -rf "/usr/local/share/boost" && sudo rm -rf "$AGENT_TOOLSDIRECTORY"
1616
- name: Build the Docker image
1717
run: docker build . --file Dockerfile --tag openfold:$(date +%s)

docs/source/Aux_seq_files.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,9 @@ All together, the file directory would look like:
6868
└── 6kwc.cif
6969
└── alignment_db
7070
├── alignment_db_0.db
71-
├── alignment_db_1.db
72-
...
73-
├── alignment_db_9.db
71+
├── alignment_db_1.db
72+
...
73+
├── alignment_db_9.db
7474
└── alignment_db.index
7575
```
7676

docs/source/Inference.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ $ bash scripts/download_openfold_params.sh $PARAMS_DIR
4242

4343
We recommend selecting `openfold/resources` as the params directory as this is the default directory used by the `run_pretrained_openfold.py` to locate parameters.
4444

45-
If you choose to use a different directory, you may make a symlink to the `openfold/resources` directory, or specify an alternate parameter path with the command line argument `--jax_path` for AlphaFold parameters or `--openfold_checkpoint_path` for OpenFold parameters.
45+
If you choose to use a different directory, you may make a symlink to the `openfold/resources` directory, or specify an alternate parameter path with the command line argument `--jax_param_path` for AlphaFold parameters or `--openfold_checkpoint_path` for OpenFold parameters.
4646

4747

4848
### Model Inference
@@ -62,7 +62,7 @@ python3 run_pretrained_openfold.py \
6262
$TEMPLATE_MMCIF_DIR
6363
--output_dir $OUTPUT_DIR \
6464
--config_preset model_1_ptm \
65-
--uniref90_database_path $BASE_DATA_DIR/uniref90 \
65+
--uniref90_database_path $BASE_DATA_DIR/uniref90/uniref90.fasta \
6666
--mgnify_database_path $BASE_DATA_DIR/mgnify/mgy_clusters_2018_12.fa \
6767
--pdb70_database_path $BASE_DATA_DIR/pdb70 \
6868
--uniclust30_database_path $BASE_DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
@@ -138,6 +138,7 @@ Some commonly used command line flags are here. A full list of flags can be view
138138
- `--data_random_seed`: Specifies a random seed to use.
139139
- `--save_outputs`: Saves a copy of all outputs from the model, e.g. the output of the msa track, ptm heads.
140140
- `--experiment_config_json`: Specify configuration settings using a json file. For example, passing a json with `{globals.relax.max_iterations = 10}` specifies 10 as the maximum number of relaxation iterations. See for [`openfold/config.py`](https://github.com/aqlaboratory/openfold/blob/main/openfold/config.py#L283) the full dictionary of configuration settings. Any parameters that are not manually set in these configuration settings will refer to the defaults specified by your `config_preset`.
141+
- `--use_custom_template`: Uses all .cif files in `template_mmcif_dir` as template input. Make sure the chains of interest have the identifier _A_ and have the same length as the input sequence. The same templates will be read for all sequences that are passed for inference.
141142

142143

143144
### Advanced Options for Increasing Efficiency
@@ -159,12 +160,12 @@ Note that chunking (as defined in section 1.11.8 of the AlphaFold 2 supplement)
159160
#### Long sequence inference
160161
To minimize memory usage during inference on long sequences, consider the following changes:
161162

162-
- As noted in the AlphaFold-Multimer paper, the AlphaFold/OpenFold template stack is a major memory bottleneck for inference on long sequences. OpenFold supports two mutually exclusive inference modes to address this issue. One, `average_templates` in the `template` section of the config, is similar to the solution offered by AlphaFold-Multimer, which is simply to average individual template representations. Our version is modified slightly to accommodate weights trained using the standard template algorithm. Using said weights, we notice no significant difference in performance between our averaged template embeddings and the standard ones. The second, `offload_templates`, temporarily offloads individual template embeddings into CPU memory. The former is an approximation while the latter is slightly slower; both are memory-efficient and allow the model to utilize arbitrarily many templates across sequence lengths. Both are disabled by default, and it is up to the user to determine which best suits their needs, if either.
163-
- Inference-time low-memory attention (LMA) can be enabled in the model config. This setting trades off speed for vastly improved memory usage. By default, LMA is run with query and key chunk sizes of 1024 and 4096, respectively. These represent a favorable tradeoff in most memory-constrained cases. Powerusers can choose to tweak these settings in `openfold/model/primitives.py`. For more information on the LMA algorithm, see the aforementioned Staats & Rabe preprint.
164-
- Disable `tune_chunk_size` for long sequences. Past a certain point, it only wastes time.
165-
- As a last resort, consider enabling `offload_inference`. This enables more extensive CPU offloading at various bottlenecks throughout the model.
163+
- As noted in the AlphaFold-Multimer paper, the AlphaFold/OpenFold template stack is a major memory bottleneck for inference on long sequences. OpenFold supports two mutually exclusive inference modes to address this issue. One, `average_templates` in the `template` section of the config, is similar to the solution offered by AlphaFold-Multimer, which is simply to average individual template representations. Our version is modified slightly to accommodate weights trained using the standard template algorithm. Using said weights, we notice no significant difference in performance between our averaged template embeddings and the standard ones. The second, `offload_templates`, temporarily offloads individual template embeddings into CPU memory. The former is an approximation while the latter is slightly slower; both are memory-efficient and allow the model to utilize arbitrarily many templates across sequence lengths. Both are disabled by default, and it is up to the user to determine which best suits their needs, if either.
164+
- Inference-time low-memory attention (LMA) can be enabled in the model config. This setting trades off speed for vastly improved memory usage. By default, LMA is run with query and key chunk sizes of 1024 and 4096, respectively. These represent a favorable tradeoff in most memory-constrained cases. Powerusers can choose to tweak these settings in `openfold/model/primitives.py`. For more information on the LMA algorithm, see the aforementioned Staats & Rabe preprint.
165+
- Disable `tune_chunk_size` for long sequences. Past a certain point, it only wastes time.
166+
- As a last resort, consider enabling `offload_inference`. This enables more extensive CPU offloading at various bottlenecks throughout the model.
166167
- Disable FlashAttention, which seems unstable on long sequences.
167168

168-
Using the most conservative settings, we were able to run inference on a 4600-residue complex with a single A100. Compared to AlphaFold's own memory offloading mode, ours is considerably faster; the same complex takes the more efficent AlphaFold-Multimer more than double the time. Use the `long_sequence_inference` config option to enable all of these interventions at once. The `run_pretrained_openfold.py` script can enable this config option with the `--long_sequence_inference` command line option
169+
Using the most conservative settings, we were able to run inference on a 4600-residue complex with a single A100. Compared to AlphaFold's own memory offloading mode, ours is considerably faster; the same complex takes the more efficent AlphaFold-Multimer more than double the time. Use the `long_sequence_inference` config option to enable all of these interventions at once. The `run_pretrained_openfold.py` script can enable this config option with the `--long_sequence_inference` command line option
169170

170-
Input FASTA files containing multiple sequences are treated as complexes. In this case, the inference script runs AlphaFold-Gap, a hack proposed [here](https://twitter.com/minkbaek/status/1417538291709071362?lang=en), using the specified stock AlphaFold/OpenFold parameters (NOT AlphaFold-Multimer).
171+
Input FASTA files containing multiple sequences are treated as complexes. In this case, the inference script runs AlphaFold-Gap, a hack proposed [here](https://twitter.com/minkbaek/status/1417538291709071362?lang=en), using the specified stock AlphaFold/OpenFold parameters (NOT AlphaFold-Multimer).
Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ In this guide, we will OpenFold and its dependencies.
44

55
**Pre-requisites**
66

7-
This package is currently supported for CUDA 11 and Pytorch 1.12. All dependencies are listed in the [`environment.yml`](https://github.com/aqlaboratory/openfold/blob/main/environment.yml)
7+
This package is currently supported for CUDA 12 and Pytorch 2. All dependencies are listed in the [`environment.yml`](https://github.com/aqlaboratory/openfold/blob/main/environment.yml).
88

99
At this time, only Linux systems are supported.
1010

@@ -19,10 +19,17 @@ At this time, only Linux systems are supported.
1919
Mamba is recommended as the dependencies required by OpenFold are quite large and mamba can speed up the process.
2020
- Activate the environment, e.g `conda activate openfold_env`
2121
1. Run the setup script to configure kernels and folding resources.
22-
> scripts/install_third_party_dependencies.sh`
23-
3. Prepend the conda environment to the $LD_LIBRARY_PATH., e.g.
24-
`export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH``. You may optionally set this as a conda environment variable according to the [conda docs](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#saving-environment-variables) to activate each time the environment is used.
25-
4. Download parameters. We recommend using a destination as `openfold/resources` as our unittests will look for the weights there.
22+
> scripts/install_third_party_dependencies.sh
23+
1. Prepend the conda environment to the `$LD_LIBRARY_PATH` and `$LIBRARY_PATH`., e.g.
24+
25+
```
26+
export LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH
27+
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
28+
```
29+
30+
You may optionally set this as a conda environment variable according to the [conda docs](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#saving-environment-variables) to activate each time the environment is used.
31+
32+
1. Download parameters. We recommend using a destination as `openfold/resources` as our unittests will look for the weights there.
2633
- For AlphaFold2 weights, use
2734
> ./scripts/download_alphafold_params.sh <dest>
2835
- For OpenFold weights, use :
@@ -46,10 +53,9 @@ Certain tests perform equivalence comparisons with the AlphaFold implementation.
4653

4754
## Environment specific modifications
4855

49-
### CUDA 12
50-
To use OpenFold on CUDA 12 environment rather than a CUDA 11 environment.
51-
In step 1, use the branch [`pl_upgrades`](https://github.com/aqlaboratory/openfold/tree/pl_upgrades) rather than the main branch, i.e. replace the URL in step 1 with https://github.com/aqlaboratory/openfold/tree/pl_upgrades
52-
Follow the rest of the steps of [Installation Guide](#Installation)
56+
### MPI
57+
To use OpenFold with MPI support, you will need to add the package [`mpi4py`](https://pypi.org/project/mpi4py/). This can be done with pip in your OpenFold environment, e.g. `$ pip install mpi4py`.
58+
5359

5460
### Install OpenFold parameters without aws
5561
If you don't have access to `aws` on your system, you can use a different download source:
@@ -59,4 +65,4 @@ If you don't have access to `aws` on your system, you can use a different downlo
5965

6066
### Docker setup
6167

62-
A [`Dockerfile`] is provided to build an OpenFold Docker image. Additional notes for setting up a docker container for OpenFold and running inference can be found [here](original_readme.md#building-and-using-the-docker-container).
68+
A [`Dockerfile`](https://github.com/aqlaboratory/openfold/blob/main/Dockerfile) is provided to build an OpenFold Docker image. Additional notes for setting up a docker container for OpenFold and running inference can be found [here](original_readme.md#building-and-using-the-docker-container).

docs/source/Multimer_Inference.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,7 @@ python3 run_pretrained_openfold.py \
7272
--output_dir ./
7373
```
7474

75-
Note that template searching in the multimer pipeline
76-
uses HMMSearch with the PDB SeqRes database, replacing HHSearch and PDB70 used in the monomer pipeline.
77-
78-
As with monomer inference, if you've already computed alignments for the query, you can use
79-
the `--use_precomputed_alignments` option.
75+
**Notes:**
76+
- Template searching in the multimer pipeline uses HMMSearch with the PDB SeqRes database, replacing HHSearch and PDB70 used in the monomer pipeline.
77+
- As with monomer inference, if you've already computed alignments for the query, you can use the `--use_precomputed_alignments` option.
78+
- At this time, only AlphaFold parameter weights are available for multimer mode.

docs/source/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
:align: center
66
:alt: Comparison of OpenFold and AlphaFold2 predictions to the experimental structure of PDB 7KDX, chain B._
77
```
8-
Welcome to the Documentation for OpenFold, the fully open source, trainable, PyTorch-based reproduction of DeepMind's
8+
Welcome to the Documentation for [OpenFold](https://github.com/aqlaboratory/openfold), the fully open source, trainable, PyTorch-based reproduction of DeepMind's
99
[AlphaFold 2](https://github.com/deepmind/alphafold).
1010

1111
Here, you will find guides and documentation for:
@@ -115,4 +115,4 @@ Aux_seq_files.md
115115
OpenFold_Parameters.md
116116
FAQ.md
117117
original_readme.md
118-
```
118+
```

environment.yml

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,34 +8,33 @@ dependencies:
88
- cuda
99
- gcc=12.4
1010
- python=3.10
11-
- libgcc=7.2
1211
- setuptools=59.5.0
1312
- pip
14-
- openmm=7.7
13+
- openmm
1514
- pdbfixer
1615
- pytorch-lightning
1716
- biopython
18-
- numpy<2.0.0
17+
- numpy
1918
- pandas
20-
- PyYAML==5.4.1
19+
- PyYAML
2120
- requests
2221
- scipy
23-
- tqdm==4.62.2
22+
- tqdm
2423
- typing-extensions
2524
- wandb
2625
- modelcif==0.7
2726
- awscli
2827
- ml-collections
29-
- mkl=2022.1
3028
- aria2
29+
- mkl
3130
- git
3231
- bioconda::hmmer
3332
- bioconda::hhsuite
3433
- bioconda::kalign2
35-
- pytorch::pytorch=2.1
36-
- pytorch::pytorch-cuda=12.1
34+
- pytorch::pytorch=2.5
35+
- pytorch::pytorch-cuda=12.4
3736
- pip:
38-
- deepspeed==0.12.4
37+
- deepspeed==0.14.5
3938
- dm-tree==0.1.6
4039
- git+https://github.com/NVIDIA/dllogger.git
4140
- flash-attn

notebooks/OpenFold.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@
111111
"os.system(\"wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh\")\n",
112112
"os.system(\"bash Mambaforge-Linux-x86_64.sh -bfp /usr/local\")\n",
113113
"os.system(\"mamba config --set auto_update_conda false\")\n",
114-
"os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=7.7.0 python={python_version} pdbfixer biopython=1.79\")\n",
114+
"os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=7.7.0 python={python_version} pdbfixer biopython=1.83\")\n",
115115
"os.system(\"pip install -q torch ml_collections py3Dmol modelcif\")\n",
116116
"\n",
117117
"try:\n",
@@ -127,7 +127,7 @@
127127
"\n",
128128
" %shell mkdir -p /content/openfold/openfold/resources\n",
129129
"\n",
130-
" commit = \"a96ffd67f8c96f8c4decc3abdd2cffbb57fc5764\"\n",
130+
" commit = \"3bec3e9b2d1e8bdb83887899102eff7d42dc2ba9\"\n",
131131
" os.system(f\"pip install -q git+https://github.com/aqlaboratory/openfold.git@{commit}\")\n",
132132
"\n",
133133
" os.system(f\"cp -f -p /content/stereo_chemical_props.txt /usr/local/lib/python{python_version}/site-packages/openfold/resources/\")\n",
@@ -907,4 +907,4 @@
907907
},
908908
"nbformat": 4,
909909
"nbformat_minor": 0
910-
}
910+
}

openfold/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -660,7 +660,7 @@ def model_config(
660660
},
661661
"relax": {
662662
"max_iterations": 0, # no max
663-
"tolerance": 2.39,
663+
"tolerance": 10.0,
664664
"stiffness": 10.0,
665665
"max_outer_iterations": 20,
666666
"exclude_residues": [],

openfold/data/data_pipeline.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,19 @@
2323
from typing import Mapping, Optional, Sequence, Any, MutableMapping, Union
2424
import numpy as np
2525
import torch
26-
from openfold.data import templates, parsers, mmcif_parsing, msa_identifiers, msa_pairing, feature_processing_multimer
27-
from openfold.data.templates import get_custom_template_features, empty_template_feats
26+
from openfold.data import (
27+
templates,
28+
parsers,
29+
mmcif_parsing,
30+
msa_identifiers,
31+
msa_pairing,
32+
feature_processing_multimer,
33+
)
34+
from openfold.data.templates import (
35+
get_custom_template_features,
36+
empty_template_feats,
37+
CustomHitFeaturizer,
38+
)
2839
from openfold.data.tools import jackhmmer, hhblits, hhsearch, hmmsearch
2940
from openfold.np import residue_constants, protein
3041

@@ -38,7 +49,9 @@ def make_template_features(
3849
template_featurizer: Any,
3950
) -> FeatureDict:
4051
hits_cat = sum(hits.values(), [])
41-
if(len(hits_cat) == 0 or template_featurizer is None):
52+
if template_featurizer is None or (
53+
len(hits_cat) == 0 and not isinstance(template_featurizer, CustomHitFeaturizer)
54+
):
4255
template_features = empty_template_feats(len(input_sequence))
4356
else:
4457
templates_result = template_featurizer.get_templates(

0 commit comments

Comments
 (0)