Skip to content

Commit 558d5ec

Browse files
committed
Resolve comments in PR
1 parent 4f7ddef commit 558d5ec

7 files changed

Lines changed: 52 additions & 3 deletions

File tree

.pre-commit-config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ repos:
1717
hooks:
1818
- id: validate-pyproject
1919

20+
- repo: https://github.com/pycqa/isort
21+
rev: 5.13.2
22+
hooks:
23+
- id: isort
24+
args: [--profile=black]
25+
2026
- repo: https://github.com/astral-sh/ruff-pre-commit
2127
rev: v0.11.7
2228
hooks:

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,28 @@ Before you start, you need to create a `git-auth.txt` file in two folders respec
88
https://username:token@github.com
99
```
1010

11+
## Models
12+
13+
The models are included in the [models](models/) folder, where each model occupies a subfolder as its repo.
14+
15+
A model repo contains its README.md as a model card, which comes in two parts:
16+
- Metadata, which is a YAML section at the top, i.e., front matter.
17+
- Text descriptions, which is a Markdown file, including summary and descriptions of the model.
18+
19+
For more information, you can reference Hugging Face's [model cards](https://huggingface.co/docs/hub/en/model-cards).
20+
21+
## Datasets
22+
23+
The datasets are included in the [dataset](datasets/) folder.
24+
1125
## Benchmark
1226

27+
The benchmark is defined in the [benchmark](benchmark/) folder, where each dataset occupies a subfolder.
28+
29+
In order to build the archived file for each dataset, [pg2-dataset](https://github.com/ProteinGym2/pg2-dataset) is used.
30+
31+
You can reference [this guide](https://github.com/ProteinGym2/pg2-dataset?tab=readme-ov-file#archive-data) to build the archived dataset.
32+
1333
### Local environment
1434

1535
There are two games to benchmark: supervised and zero-shot. Each game has its selected list of models and datasets defined in `dvc.yaml`.

models/esm/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
---
2+
# Model identifier used for referencing this model in the benchmark system
23
name: "esm"
34

45
hyper_params:
6+
# HuggingFace model checkpoint identifier for the specific ESM-2 variant
57
location: "esm2_t30_150M_UR50D"
8+
# Scoring method: calculates marginal probabilities for wild-type amino acids
69
scoring_strategy: "wt-marginals"
10+
# Whether to disable GPU usage (false = use GPU if available)
711
nogpu: false
12+
# Offset index for sequence position alignment in tokenization
813
offset_idx: 24
914
---
1015

models/pls/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
---
2+
# Model identifier used for referencing this model in the benchmark system
23
name: "pls"
34

45
hyper_params:
6+
# Number of PLS components to extract (dimensionality of the reduced space)
57
n_components: 2
8+
# Standard 20 amino acid single-letter codes
69
aa_alphabet: ["A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"]
10+
# Total number of amino acids in the alphabet (must match aa_alphabet length)
711
aa_alphabet_length: 20
812
---
913

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ logging_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
4040

4141
[dependency-groups]
4242
dev = [
43+
"isort>=6.0.1",
4344
"pre-commit>=4.2.0",
4445
"pytest>=8.4.1",
4546
"pytest-cov>=6.2.1",

tests/test_model_card.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
import pytest
21
from pathlib import Path
3-
from pg2_benchmark.model_card import ModelCard
2+
3+
import pytest
44
from pydantic import ValidationError
55

6+
from pg2_benchmark.model_card import ModelCard
7+
68

79
@pytest.fixture
810
def model_card_contents() -> str:
@@ -52,4 +54,4 @@ def test_manifest_hyper_params(model_card_path: Path) -> None:
5254
except ValidationError as e:
5355
raise ValidationError("ValidationError raised") from e
5456
else:
55-
assert model_card.hyper_params["nogpu"] is False
57+
assert not model_card.hyper_params["nogpu"]

uv.lock

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)