Venus-ProSST

Code for ProSST: A Pre-trained Protein Sequence and Structure Transformer with Disentangled Attention. (NeurIPS 2024)

News

Our MSA-Enhanced model VenusREM has achieved 0.518 Spearman's rho in the ProteinGym benchmark.

1 Install

git clone https://github.com/ai4protein/ProSST.git
cd ProSST
pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$(pwd)

2 Structure quantizer

ProSST Structure Quantizer

from prosst.structure.get_sst_seq import SSTPredictor
predictor = SSTPredictor(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096
result = predictor.predict_from_pdb('example_data/p1.pdb')

Output:

[407, 998, 1841, 1421, 653, 450, 117, 822, ...]

3 ProSST models have been uploaded to huggingface 🤗 Transformers

from transformers import AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained("AI4Protein/ProSST-2048", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("AI4Protein/ProSST-2048", trust_remote_code=True)

See AI4Protein/ProSST-* for more models.

4 Zero-shot mutant effect prediction

4.1 Example notebook

Zero-shot mutant effect prediction

4.2 Run ProteinGYM Benchmark

Download dataset from Google Driver. (This file contains quantized structures within ProteinGYM).

Original PDB dataset is the same as ProtSSN, which can be downloaded from Huggingface.

cd example_data
unzip proteingym_benchmark.zip

python zero_shot/proteingym_benchmark.py --model_path AI4Protein/ProSST-2048 \
--structure_dir example_data/structure_sequence/2048

Citation

If you use ProSST in your research, please cite the following paper:

@inproceedings{li2024prosst,
    title={{ProSST}: Protein Language Modeling with Quantized Structure and Disentangled Attention},
    author={Mingchen Li and Yang Tan and Xinzhu Ma and Bozitao Zhong and Huiqun Yu and Ziyi Zhou and Wanli Ouyang and Bingxin Zhou and Pan Tan and Liang Hong},
    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
    year={2024}
}

This project is licensed under the terms of the CC-BY-NC-ND-4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
example_data		example_data
images		images
prosst/structure		prosst/structure
test		test
zero_shot		zero_shot
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Venus-ProSST

News

1 Install

2 Structure quantizer

3 ProSST models have been uploaded to huggingface 🤗 Transformers

4 Zero-shot mutant effect prediction

4.1 Example notebook

4.2 Run ProteinGYM Benchmark

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

ai4protein/ProSST

Folders and files

Latest commit

History

Repository files navigation

Venus-ProSST

News

1 Install

2 Structure quantizer

3 ProSST models have been uploaded to huggingface 🤗 Transformers

4 Zero-shot mutant effect prediction

4.1 Example notebook

4.2 Run ProteinGYM Benchmark

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages