This repository contains code for the paper Protein codes promote selective subcellular compartmentalization.
- Install mamba (recommended) or conda
- download installer from: https://github.com/conda-forge/miniforge
- install and init
bash Miniforge-pypy3-Linux-x86_64.sh
- Create environment
mamba env create -f environment.yml
- Activate
mamba activate protgps
- Download model checkpoints from zenodo and extract to
checkpoints/protgps
.
import torch
torch.hub.set_dir("checkpoints/esm2")
model, alphabet = torch.hub.load("facebookresearch/esm:main", "esm2_t6_8M_UR50D")
from transformers import AutoModel, AutoTokenizer
checkpoint = "checkpoints/drbert"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForTokenClassification.from_pretrained(checkpoint)
Training
python scripts/dispatcher.py --config configs/protein_localization/full_prot_comp_pred.json --log_dir /path/to/logdir
Inference
To make predictions, edit and run the Predict.ipynb notebook.
Generation
To generate proteins:
cd esm/examples/lm-design
./generate_nucleolus.sh
./generate_nuclear_speckle.sh
The Analysis script is located under notebook. Data used and generated by the script is located in the zenodo repository.
@article{
doi:10.1126/science.adq2634,
author = {Henry R. Kilgore and Itamar Chinn and Peter G. Mikhael and Ilan Mitnikov and Catherine Van Dongen and Guy Zylberberg and Lena Afeyan and Salman F. Banani and Susana Wilson-Hawken and Tong Ihn Lee and Regina Barzilay and Richard A. Young },
title = {Protein codes promote selective subcellular compartmentalization},
journal = {Science},
volume = {0},
number = {0},
pages = {eadq2634},
year = {},
doi = {10.1126/science.adq2634},
URL = {https://www.science.org/doi/abs/10.1126/science.adq2634},
eprint = {https://www.science.org/doi/pdf/10.1126/science.adq2634},
}