Skip to content

Confusing Documentation + Missing JSON file #14

@navvye

Description

@navvye

Hey! Congrats on the work, and the results you achieved. Great idea + data!

I think README is a bit confusing. In particular, I think the configuration is a bit confusing.

import torch

from model.ProTrek.protrek_trimodal_model import ProTrekTrimodalModel
from utils.foldseek_util import get_struc_seq

# Load model
config = {
    "protein_config": "weights/ProTrek_650M/esm2_t33_650M_UR50D",
    "text_config": "weights/ProTrek_650M/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext",
    "structure_config": "weights/ProTrek_650M/foldseek_t30_150M",
    "load_protein_pretrained": False,
    "load_text_pretrained": False,
    "from_checkpoint": "weights/ProTrek_650M/ProTrek_650M.pt"
}

Over here, both "load_protein_pretrained" and "load_text_pretrained" are False. But the "from_checkpoint" key has a .pt file attached to it. Because of the way the AbstractClass works in the code - the model is just going to ignore the earlier "load_protein_pretrained"/"load_text_pretrained" keys, and initialize the weights using the "from_checkpoint" parameter.

This is a bit confusing, and I had to spend some time going through the code just to understand why this was happening.

Additionally, the config.json file is missing from the WestLake/35M repository in huggingface. https://huggingface.co/westlake-repl/ProTrek_35M/blob/main/config.json. This leads to an unfortunate error when I try to access the model using the transformers library. I fixed this by just concatenating the individual config files in each of the subfolders of the repository - so I think this should be trivial to do.

Thanks once again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions