Skip to content

Commit 312a64b

Browse files
committed
init
0 parents  commit 312a64b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+32779
-0
lines changed

.gitignore

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.conda
2+
.vscode
3+
.ipynb*
4+
__pycache__
5+
6+
results/
7+
8+
core.*
9+
10+
.DS_Store
11+
checkpoint.pt

README.md

+152
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# CryoFold: One-shot Prediction For Cryo-EM Structure Determination
2+
3+
CryoFold is a deep learning framework for automating the determination of three-dimensional atomic structures from high-resolution cryo-electron microscopy (Cryo-EM) density maps. It addresses the limitations of existing AI-based methods by providing an end-to-end solution that integrates training and inference into a single streamlined pipeline. CryoFold combines 3D and sequence Transformers for feature extraction and employs an equivariant graph neural network to build accurate atomic structures from density maps.
4+
5+
## Table of Contents
6+
- [Background](#background)
7+
- [Features](#features)
8+
- [Installation](#installation)
9+
- [Quick Start](#quick-start)
10+
- [Usage](#usage)
11+
- [Command-Line Arguments](#command-line-arguments)
12+
- [Running the Example](#running-the-example)
13+
- [Using Custom Data](#using-custom-data)
14+
- [Tutorial](#tutorial)
15+
- [References](#references)
16+
- [Contact](#contact)
17+
- [License](#license)
18+
19+
## Background
20+
21+
Cryo-electron microscopy (Cryo-EM) has revolutionized structural biology by enabling the visualization of complex biological molecules at near-atomic resolution. The technique generates **high-resolution density maps** that offer insights into the molecular structures of proteins, viruses, and other biomolecular assemblies. However, **interpreting these density maps to derive accurate atomic models** remains a challenging and labor-intensive task, often requiring expert knowledge and manual interventions.
22+
23+
Existing AI-based methods for automating Cryo-EM structure determination face several limitations:
24+
1. **Multi-stage processing**: Current approaches often involve separate stages for feature extraction, sequence alignment, and structure prediction, leading to inefficiencies and discontinuities.
25+
2. **Alignment bias**: Techniques such as **Hidden Markov Models (HMMs)** or **Traveling Salesman Problem (TSP) solvers** introduce bias when aligning predicted atomic coordinates with the protein sequence.
26+
3. **Poor generalization**: Due to the limited size of available datasets, many methods struggle to generalize well to complex or previously unseen test cases.
27+
28+
CryoFold addresses these challenges by providing a **fully integrated, end-to-end solution** that performs **one-shot inference** with minimal manual intervention, enabling faster and more accurate structure determination.
29+
30+
## Features
31+
32+
- **🚀 End-to-End Training and Inference**: Simplifies the process by seamlessly integrating training and inference into a single, unified framework, eliminating the need for multi-stage processing.
33+
- **⚡ Fast and Accurate**: Achieves a **400% improvement in TM-score** over Cryo2Struct while reducing inference time by a factor of **1,000**.
34+
35+
For more details on the performance and benchmarking, please refer to our paper.
36+
37+
## Installation
38+
39+
To get started with CryoFold, follow these steps:
40+
41+
1. **Clone the repository**:
42+
43+
```bash
44+
git clone https://github.com/A4Bio/CryoFold.git
45+
cd CryoFold
46+
```
47+
48+
2. **Create and activate the conda environment**:
49+
50+
```bash
51+
conda env create -f environment.yml
52+
conda activate cryofold
53+
```
54+
55+
3. **Download the Pretrained Model**:
56+
57+
We provide a pretrained model for CryoFold. [Download it here]() and place it in the pretrained_models directory.
58+
59+
60+
## Quick Start
61+
62+
To quickly try out CryoFold using an example dataset, run the following command:
63+
64+
```
65+
bash run_example.sh
66+
```
67+
68+
This script runs the `inference.py` script with sample data provided in the `examples` folder. It uses a sample density map and a ground truth PDB file for evaluation.
69+
70+
We also provide an example tutorial in `quick_start.ipynb`.
71+
72+
## Usage
73+
74+
### Command-line Arguments
75+
76+
The `inference.py` script supports several command-line arguments:
77+
78+
| Argument | Description | Default |
79+
|--------------------------|---------------------------------------------------------|-------------------------------------|
80+
| `--density_map_path` | Path to the input density map directory (required). | None |
81+
| `--pdb_path` | Path to the ground truth PDB file (optional). | None |
82+
| `--model_path` | Path to the pretrained model checkpoint. | `pretrained_model/checkpoint.pt` |
83+
| `--output_dir` | Directory to save the output PDB file. | `results` |
84+
| `--device` | Device to run the model on (`cpu` or `cuda`). | `cuda` |
85+
| `--verbose` | Enable verbose output for debugging. | Disabled |
86+
87+
### Running the Example
88+
89+
You can run the example directly from the command line:
90+
91+
```bash
92+
python inference.py --density_map_path examples/density_map --pdb_path examples/5uz7.pdb
93+
```
94+
95+
### Using Custom Data
96+
97+
To use CryoFold with your own data, you need to provide a Cryo-EM density map and, optionally, a PDB file for evaluating the predicted structure. For example:
98+
99+
```bash
100+
python inference.py --density_map_path /path/to/your/density_map --pdb_path /path/to/your/ground_truth.pdb --output_dir /path/to/save/results --device cuda
101+
```
102+
103+
## Tutorial
104+
105+
### 1. Preprocessing Density Maps:
106+
107+
To normalize your density maps, run:
108+
109+
# Normalize you density maps
110+
$ bash run_data_preparation.bash examples/
111+
112+
After preprocessing, the directory structure should look like:
113+
114+
The organization of the downloaded models should look like:
115+
```text
116+
CryoFold
117+
├── examples
118+
│ ├── density_map
119+
│ │ ├── map.map
120+
│ │ ├── seq_chain_info.json
121+
│ │ └── normed_map.mrc
122+
```
123+
124+
### 2. Running Inference:
125+
126+
python inference.py --density_map_path examples/density_map --pdb_path examples/5uz7.pdb
127+
128+
After inference, the output will be saved in the specified output directory:
129+
130+
```text
131+
CryoFold
132+
├── results
133+
│ └── output.pdb
134+
```
135+
136+
## References:
137+
138+
For a complete description of the method, see:
139+
140+
141+
142+
## Contact
143+
144+
Please submit any bug reports, feature requests, or general usage feedback as a github issue or discussion.
145+
146+
- Jue Wang ([email protected])
147+
- Cheng Tan ([email protected])
148+
- Zhangyang Gao ([email protected])
149+
150+
## License
151+
152+
This project is licensed under the MIT License. See the LICENSE file for details.

environment.yml

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: cryofold
2+
channels:
3+
- pytorch
4+
- nvidia
5+
- defaults
6+
dependencies:
7+
- python=3.9
8+
- numpy
9+
- pandas
10+
- scipy
11+
- matplotlib
12+
- pytorch=2.0.1
13+
- torchvision
14+
- torchaudio
15+
- cudatoolkit=11.7
16+
- scikit-learn
17+
- pip
18+
- pip:
19+
- transformers==4.24.0
20+
- tqdm
21+
- jupyterlab
22+
- einops
23+
- torch_scatter
24+
- torch_geometric
25+
- torch_cluster
26+
- fair-esm
27+
- mrcfile
28+
- biopython
29+
prefix: /root/anaconda3/envs/cryofold

0 commit comments

Comments
 (0)