This project provides a pipeline for protein function prediction using deep learning and protein language model embeddings.
-
Prepare Data:
- Use
prepare_data.pyto process your raw FASTA and TSV files into training-ready data.
- Use
-
Extract Embeddings:
- Generate embeddings for your protein sequences (see
plm.pyorcluster_embed/).
- Generate embeddings for your protein sequences (see
-
Train Model:
- Run the main training pipeline:
python train_script.py
- The script uses configs in
configs/and saves results inoutputs/orruns/.
- Run the main training pipeline:
train_script.py: Main entry for trainingprepare_data.py: Data preparationconfigs/: Experiment/model configsNetwork/,models/: Model codeutils/: Utilities