You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Building and Using the Docker Container](#building-and-using-the-docker-container)
24
+
-[Copyright Notice](#copyright-notice)
25
+
-[Contributing](#contributing)
26
+
-[Citing this Work](#citing-this-work)
27
+
10
28
## Features
11
29
12
30
OpenFold carefully reproduces (almost) all of the features of the original open
13
-
source inference code (v2.0.1). The sole exception is model ensembling, which
14
-
fared poorly in DeepMind's own ablation testing and is being phased out in future
15
-
DeepMind experiments. It is omitted here for the sake of reducing clutter. In
16
-
cases where the *Nature* paper differs from the source, we always defer to the
31
+
source monomer (v2.0.1) and multimer (v2.3.2) inference code. The sole exception is
32
+
model ensembling, which fared poorly in DeepMind's own ablation testing and is being
33
+
phased out in future DeepMind experiments. It is omitted here for the sake of reducing
34
+
clutter. In cases where the *Nature* paper differs from the source, we always defer to the
17
35
latter.
18
36
19
37
OpenFold is trainable in full precision, half precision, or `bfloat16` with or without DeepSpeed,
@@ -63,7 +81,7 @@ To install:
63
81
For some systems, it may help to append the Conda environment library path to `$LD_LIBRARY_PATH`. The `install_third_party_dependencies.sh` script does this once, but you may need this for each bash instance.
64
82
65
83
66
-
## Usage
84
+
## Download Alignment Databases
67
85
68
86
If you intend to generate your own alignments, e.g. for inference, you have two
69
87
choices for downloading protein databases, depending on whether you want to use
@@ -112,7 +130,16 @@ DeepMind's pretrained parameters, you will only be able to make changes that
112
130
do not affect the shapes of model parameters. For an example of initializing
113
131
the model, consult `run_pretrained_openfold.py`.
114
132
115
-
### Inference
133
+
## Inference
134
+
135
+
OpenFold now supports three inference modes:
136
+
-[Monomer Inference](#monomer-inference): OpenFold reproduction of AlphaFold2. Inference available with either DeepMind's pretrained parameters or OpenFold trained parameters.
137
+
-[Multimer Inference](#multimer-inference): OpenFold reproduction of AlphaFold-Multimer. Inference available with DeepMind's pre-trained parameters.
138
+
-[Single Sequence Inference (SoloSeq)](#soloseq-inference): Language Model based structure prediction, using [ESM-1b](https://github.com/facebookresearch/esm) embeddings.
139
+
140
+
More instructions for each inference mode are provided below:
141
+
142
+
### Monomer inference
116
143
117
144
To run inference on a sequence or multiple sequences using a set of DeepMind's
118
145
pretrained parameters, first download the OpenFold weights e.g.:
2. Download the [UniProt](https://www.uniprot.org/uniprotkb/)
289
+
and [PDB SeqRes](https://www.rcsb.org/) databases:
290
+
291
+
```bash
292
+
bash scripts/download_uniprot.sh data/
293
+
```
294
+
295
+
The PDB SeqRes and PDB databases must be from the same date to avoid potential
296
+
errors during template searching. Remove the existing `data/pdb_mmcif` directory
297
+
and download both databases:
298
+
299
+
```bash
300
+
bash scripts/download_pdb_mmcif.sh data/
301
+
bash scripts/download_pdb_seqres.sh data/
302
+
```
303
+
304
+
3. Additionally, AlphaFold-Multimer uses upgraded versions of the [MGnify](https://www.ebi.ac.uk/metagenomics)
305
+
and [UniRef30](https://uniclust.mmseqs.com/) (previously UniClust30) databases. To download the upgraded databases, run:
306
+
307
+
```bash
308
+
bash scripts/download_uniref30.sh data/
309
+
bash scripts/download_mgnify.sh data/
310
+
```
311
+
Multimer inference can also run with the older database versions if desired.
312
+
313
+
314
+
### Soloseq Inference
315
+
225
316
To run inference for a sequence using the SoloSeq single-sequence model, you can either precompute ESM-1b embeddings in bulk, or you can generate them during inference.
226
317
227
318
For generating ESM-1b embeddings in bulk, use the provided script: `scripts/precompute_embeddings.py`. The script takes a directory of FASTA files (one sequence per file) and generates ESM-1b embeddings in the same format and directory structure as required by SoloSeq. Following is an example command to use the script:
@@ -274,7 +365,7 @@ SoloSeq allows you to use the same flags and optimizations as the MSA-based Open
274
365
275
366
**NOTE:** Due to the nature of the ESM-1b embeddings, the sequence length for inference using the SoloSeq model is limited to 1022 residues. Sequences longer than that will be truncated.
276
367
277
-
###Training
368
+
## Training
278
369
279
370
To train the model, you will first need to precompute protein alignments.
280
371
@@ -412,17 +503,17 @@ environment. These run components of AlphaFold and OpenFold side by side and
412
503
ensure that output activations are adequately similar. For most modules, we
413
504
target a maximum pointwise difference of `1e-4`.
414
505
415
-
## Building and using the docker container
506
+
## Building and Using the Docker Container
416
507
417
-
### Building the docker image
508
+
**Building the Docker Image**
418
509
419
510
Openfold can be built as a docker container using the included dockerfile. To build it, run the following command from the root of this repository:
420
511
421
512
```bash
422
513
docker build -t openfold .
423
514
```
424
515
425
-
### Running the docker container
516
+
**Running the Docker Container**
426
517
427
518
The built container contains both `run_pretrained_openfold.py` and `train_openfold.py` as well as all necessary software dependencies. It does not contain the model parameters, sequence, or structural databases. These should be downloaded to the host machine following the instructions in the Usage section above.
While AlphaFold's and, by extension, OpenFold's source code is licensed under
468
559
the permissive Apache Licence, Version 2.0, DeepMind's pretrained parameters
@@ -475,7 +566,7 @@ replaces the original, more restrictive CC BY-NC 4.0 license as of January 2022.
475
566
If you encounter problems using OpenFold, feel free to create an issue! We also
476
567
welcome pull requests from the community.
477
568
478
-
## Citing this work
569
+
## Citing this Work
479
570
480
571
Please cite our paper:
481
572
@@ -504,4 +595,4 @@ If you use OpenProteinSet, please also cite:
504
595
primaryClass={q-bio.BM}
505
596
}
506
597
```
507
-
Any work that cites OpenFold should also cite AlphaFold.
598
+
Any work that cites OpenFold should also cite [AlphaFold](https://www.nature.com/articles/s41586-021-03819-2) and [AlphaFold-Multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1) if applicable.
0 commit comments