small fixes README by karel-w · Pull Request #113 · ProteinGym/proteingym-benchmark

karel-w · 2025-09-29T08:32:13Z

Changes

Resolves no specific issue

While going through the README:

corrected to dead links
added an tip for AWS SSO based login.

Checklist

I broke the PR down so that it contains a reasonable amount of changes for an effective review
I performed a self-review of my code. Amongst other things, I have commented my code in hard-to-understand areas.
I made corresponding changes to the documentation
I added tests that prove my fix is effective or that my feature works
I accounted for dependent changes to be merged and published in downstream modules

tintinrevient · 2025-09-29T08:43:17Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.005050505050505051
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.005076142131979714
Bennett S,-0.005076142131979696
Kappa Standard Error,0.0
Kappa Unbiased,-0.005076142131979696
Scott PI,-0.005076142131979696
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,6.62935662007962
Reference Entropy,6.62935662007962
Cross Entropy,0
Joint Entropy,6.62935662007962
Conditional Entropy,-0.0
Mutual Information,6.62935662007962
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,38809
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,99
NIR,0.010101010101010102
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9949238578680203
TNR Macro,0.9949494949494949
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.005050505050505083
FNR Macro,None
PPV Macro,None
NPV Macro,0.9949494949494949
ACC Macro,0.98989898989899
F1 Macro,0.0
FPR Micro,0.005076142131979711
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9949238578680203
Spearman,0.607730364873222

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

karel-w · 2025-09-29T09:15:43Z

While going over it noticed we haven't explained anywhere how to build the model files. Added this to the models/README.md file.

tintinrevient · 2025-09-29T09:25:52Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.005050505050505051
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.005076142131979714
Bennett S,-0.005076142131979696
Kappa Standard Error,0.0
Kappa Unbiased,-0.005076142131979696
Scott PI,-0.005076142131979696
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,6.62935662007962
Reference Entropy,6.62935662007962
Cross Entropy,0
Joint Entropy,6.62935662007962
Conditional Entropy,-0.0
Mutual Information,6.62935662007962
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,38809
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,99
NIR,0.010101010101010102
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9949238578680203
TNR Macro,0.9949494949494949
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.005050505050505083
FNR Macro,None
PPV Macro,None
NPV Macro,0.9949494949494949
ACC Macro,0.98989898989899
F1 Macro,0.0
FPR Micro,0.005076142131979711
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9949238578680203
Spearman,0.6693753865182436

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

tintinrevient · 2025-09-29T12:37:14Z

+# We specifically use /opt/program as AWS expect the files to be present in this location.
+WORKDIR /opt/program
+
+# 5. Copy benchmark framework


This step will be changed, after benchmark is public, so the doc needs to be updated accordingly.

tintinrevient · 2025-09-29T12:38:02Z

+COPY ./models/YOUR_MODEL/README.md ./README.md
+COPY ./models/YOUR_MODEL/pyproject.toml ./pyproject.toml
+
+# 7. Handle private repository access (if needed)


this step will be removed as well, when proteingym-base is public.

tintinrevient · 2025-09-29T12:41:11Z

+
+```bash
+# From the project root directory
+docker build -f models/YOUR_MODEL/Dockerfile -t your-model .


based on the current settings: proteingym-base (a.k.a. pg2-dataset) is private, the secret is needed to build the image.

docker build --secret id=git_auth,src=git-auth.txt ...

Reference is here: https://github.com/ProteinGym/pg2-model-esm

The latest working reference is in this line in local DVC:

proteingym-benchmark/benchmark/supervised/local/dvc.yaml

Line 34 in d438e82

- docker build --build-arg GIT_CACHE_BUST=${git.git_cache_bust} --secret id=git_auth,src=../git-auth.txt -f ${item.model.dockerfile} -t ${item.model.name}:latest ../../..

tintinrevient · 2025-09-29T12:44:00Z

+To test it locally:
+
+```bash
+docker run --rm your-model train --help


The volumes need to be attached.

The latest working reference is in this line in local DVC:

proteingym-benchmark/benchmark/supervised/local/dvc.yaml

Line 35 in d438e82

- docker run --rm -v $(realpath ${source.datasets_dir}):/datasets -v $(realpath ${source.models_dir}):/models -v $(realpath ${destination.output_dir}):/opt/ml/model ${item.model.name}:latest train --dataset-file ${item.dataset.container_path} --model-card-file ${item.model.container_path}

tintinrevient · 2025-09-29T12:47:32Z

While going over it noticed we haven't explained anywhere how to build the model files. Added this to the models/README.md file.

Hey Karel,

For the Dockerfile comments, it looks good for now, just a reminder that it will be updated when both proteingym-base and benchmark packages are public.

For the Docker build and run part, you can reference the code in dvc.yaml in the local environment. Once they are updated, it will make a working version based on the current settings.

tintinrevient · 2025-10-07T10:37:08Z

@karel-w this PR can be abandoned, as many major changes in Dockerfile. I will update the README in a bit.

small fixes README

c4d2173

karel-w requested a review from tintinrevient September 29, 2025 08:32

added section on dockerfiles to model readme

166bb38

tintinrevient reviewed Sep 29, 2025

View reviewed changes

tintinrevient closed this Oct 7, 2025

tintinrevient deleted the readme_corrections branch December 1, 2025 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small fixes README#113

small fixes README#113
karel-w wants to merge 2 commits intomainfrom
readme_corrections

karel-w commented Sep 29, 2025 •

edited

Loading

Uh oh!

tintinrevient commented Sep 29, 2025

Uh oh!

karel-w commented Sep 29, 2025

Uh oh!

tintinrevient commented Sep 29, 2025

Uh oh!

tintinrevient Sep 29, 2025

Uh oh!

tintinrevient Sep 29, 2025

Uh oh!

tintinrevient Sep 29, 2025

Uh oh!

tintinrevient Sep 29, 2025

Uh oh!

tintinrevient commented Sep 29, 2025

Uh oh!

tintinrevient commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karel-w commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Checklist

Uh oh!

tintinrevient commented Sep 29, 2025

Uh oh!

karel-w commented Sep 29, 2025

Uh oh!

tintinrevient commented Sep 29, 2025

Uh oh!

tintinrevient Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

tintinrevient Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

tintinrevient Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

tintinrevient Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

tintinrevient commented Sep 29, 2025

Uh oh!

tintinrevient commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karel-w commented Sep 29, 2025 •

edited

Loading