You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description: This dataset contains MSAs and predicted structures for 13 million long (sequence length >= 200 amino acids) monomers from the MGNIFY database. These MSAs were generated using the AF3 protocol, and were used to predict structures with AlphaFold2. This data serves as the long monomer distillation set for Openfold3, an open-source, all-atom ligand, RNA and protein structure prediction software.
This dataset contains MSAs and predicted structures used to train OpenFold3 preview, an open-source, all-atom ligand, RNA and protein structure prediction software. This includes -
4
+
- PDB - 245k structures and alignments from the RCSB Protein Data Bank - https://www.rcsb.org/
5
+
- Long monomer distillation set - ~13 million long (sequence length >= 200 amino acids) monomers from the MGNIFY database - https://www.ebi.ac.uk/metagenomics/.
6
+
- Short monomer distillation set - 400k short (sequence length < 200 amino acid) monomers from the MGNIFY database - https://www.ebi.ac.uk/metagenomics/.
7
+
- Disordered set - AF2-predicted structures for unresolved segments missing from the PDB
8
+
- RNA - OF3p2-predicted RNA monomer structures generated from a clustered version of RFAM (current version)
9
+
For the distillation sets MSAs were generated using the AF3 protocol, and were used to predict structures with AlphaFold2, more details can be found in our whitepaper - https://portal.openfold.omsf.io/reports/of3p2_technical_report.pdf
10
+
For a full description and an interactive data explorer, please visit https://portal.openfold.omsf.io/datasets
0 commit comments