This is the source code of "ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models" (https://doi.org/10.1021/acs.jcim.4c01752).
/procesa/environment.yml
Download pretrained esm1b, esm1v, esm2 model from original esm github. Put these models in /dwnl_ckpts
.
Use scripts in /procesa/FLIP/baselines/scripts/
to generate dgl graph pkl files. Generated data will be saved in /datasets
.
(Due to file size limit, the hotprotein-S dataset is in https://drive.google.com/file/d/1VvAXKw01hMrKBMsOMDB5OKcQlvSzN0TN/view?usp=drive_link.)
Run scripts in /procesa/scripts
to train and evaluate models. Results will be saved in /procesa/results/
. The correspondence between results shown in paper and running scripts are shown in figure below.
For hotprotein-s2c2 and hotprotein-s2c5, you can change DATANAME
and EXPNAME
to run other experiments, like s2c2_1
and model31
.
For hotprotein-S, you can change EXPNAME
to run other experiments, like model116
.
Results are saved in /procesa/results
folder.
If you make use of this code or the ProCeSa algorithm in your work, please cite the following paper:
@article{zhou2025procesa,
title={ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models},
author={Zhou, Feixiang and Zhang, Shuo and Zhang, Huifeng and Liu, Jian K.},
journal={Journal of Chemical Information and Modeling},
year={2025},
publisher={American Chemical Society (ACS)}
}