-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hey! Congrats on the work, and the results you achieved. Great idea + data!
I think README is a bit confusing. In particular, I think the configuration is a bit confusing.
import torch
from model.ProTrek.protrek_trimodal_model import ProTrekTrimodalModel
from utils.foldseek_util import get_struc_seq
# Load model
config = {
"protein_config": "weights/ProTrek_650M/esm2_t33_650M_UR50D",
"text_config": "weights/ProTrek_650M/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext",
"structure_config": "weights/ProTrek_650M/foldseek_t30_150M",
"load_protein_pretrained": False,
"load_text_pretrained": False,
"from_checkpoint": "weights/ProTrek_650M/ProTrek_650M.pt"
}Over here, both "load_protein_pretrained" and "load_text_pretrained" are False. But the "from_checkpoint" key has a .pt file attached to it. Because of the way the AbstractClass works in the code - the model is just going to ignore the earlier "load_protein_pretrained"/"load_text_pretrained" keys, and initialize the weights using the "from_checkpoint" parameter.
This is a bit confusing, and I had to spend some time going through the code just to understand why this was happening.
Additionally, the config.json file is missing from the WestLake/35M repository in huggingface. https://huggingface.co/westlake-repl/ProTrek_35M/blob/main/config.json. This leads to an unfortunate error when I try to access the model using the transformers library. I fixed this by just concatenating the individual config files in each of the subfolders of the repository - so I think this should be trivial to do.
Thanks once again!