Add AWS DVC file by tintinrevient · Pull Request #74 · ProteinGym/proteingym-benchmark

tintinrevient · 2025-07-22T09:02:17Z

This PR resolves #8 and #52

The major TODOs:

Use ZIP file for each dataset and load them by Dataset.from_path()
hyper parameters can be passed in S3 prefix
Finally, in train(), dataset.toml is not needed.

…o feat/add-dvc-aws-yaml

…vc-aws-yaml

karel-w · 2025-07-23T11:27:27Z

+    cmd:
+      - aws ecr describe-repositories --repository-names ${item.model.name} --region ${aws.region_name} >/dev/null 2>&1 || aws ecr create-repository --repository-name ${item.model.name} --region ${aws.region_name} >/dev/null
+      - aws ecr get-login-password --region ${aws.region_name} | docker login --username AWS --password-stdin ${aws.account_id}.dkr.ecr.${aws.region_name}.amazonaws.com
+      - docker buildx build --build-arg GIT_CACHE_BUST=${local.git_cache_bust} --platform linux/amd64,linux/arm64 --secret id=git_auth,src=git-auth.txt -t ${aws.account_id}.dkr.ecr.${aws.region_name}.amazonaws.com/${item.model.name}:latest ${item.model.dockerfile} --push


Would it make sense that cmd here could be pg2-benchmark aws upload? For local we perform building in pg2-benchmark model predict

tintinrevient · 2025-07-24T08:24:54Z

+
+  upload_to_s3:
+    cmd:
+      - aws s3 cp ${local.data_dir}/ s3://${aws.s3_training_data_prefix}/${local.data_dir}/ --recursive --exclude ".*" --exclude "*/.*"


check if they exist before cp

tintinrevient · 2025-07-24T08:25:19Z

+  upload_to_s3:
+    cmd:
+      - aws s3 cp ${local.data_dir}/ s3://${aws.s3_training_data_prefix}/${local.data_dir}/ --recursive --exclude ".*" --exclude "*/.*"
+      - aws s3 cp ${local.model_dir}/ s3://${aws.s3_training_data_prefix}/${local.model_dir}/ --recursive --exclude ".*" --exclude "*/.*"


add content in the hyperparams, so remove this

It is not nice to remove model.toml in 2 reasons:

Manifest file belongs to each model case by case, so model's Manifest file should not be used in pg2-benchmark to extend into hyper-parameters.

In AWS SageMaker's training job, when passing TOML as hyper-parameters, the typed key-value pair will all become (string-to-string) mapping, so it is better to load model's manifest in each model's method.

tintinrevient · 2025-07-24T16:33:52Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.005050505050505051
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.005076142131979714
Bennett S,-0.005076142131979696
Kappa Standard Error,0.0
Kappa Unbiased,-0.005076142131979696
Scott PI,-0.005076142131979696
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,6.62935662007962
Reference Entropy,6.62935662007962
Cross Entropy,0
Joint Entropy,6.62935662007962
Conditional Entropy,-0.0
Mutual Information,6.62935662007962
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,38809
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,99
NIR,0.010101010101010102
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9949238578680203
TNR Macro,0.9949494949494949
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.005050505050505083
FNR Macro,None
PPV Macro,None
NPV Macro,0.9949494949494949
ACC Macro,0.98989898989899
F1 Macro,0.0
FPR Micro,0.005076142131979711
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9949238578680203
Spearman,0.5496969696969696

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

tintinrevient · 2025-07-28T17:20:36Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.005050505050505051
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.005076142131979714
Bennett S,-0.005076142131979696
Kappa Standard Error,0.0
Kappa Unbiased,-0.005076142131979696
Scott PI,-0.005076142131979696
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,6.62935662007962
Reference Entropy,6.62935662007962
Cross Entropy,0
Joint Entropy,6.62935662007962
Conditional Entropy,-0.0
Mutual Information,6.62935662007962
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,38809
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,99
NIR,0.010101010101010102
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9949238578680203
TNR Macro,0.9949494949494949
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.005050505050505083
FNR Macro,None
PPV Macro,None
NPV Macro,0.9949494949494949
ACC Macro,0.98989898989899
F1 Macro,0.0
FPR Micro,0.005076142131979711
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9949238578680203
Spearman,0.6528385899814472

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

tintinrevient · 2025-07-28T17:22:29Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.005050505050505051
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.005076142131979714
Bennett S,-0.005076142131979696
Kappa Standard Error,0.0
Kappa Unbiased,-0.005076142131979696
Scott PI,-0.005076142131979696
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,6.62935662007962
Reference Entropy,6.62935662007962
Cross Entropy,0
Joint Entropy,6.62935662007962
Conditional Entropy,-0.0
Mutual Information,6.62935662007962
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,38809
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,99
NIR,0.010101010101010102
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9949238578680203
TNR Macro,0.9949494949494949
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.005050505050505083
FNR Macro,None
PPV Macro,None
NPV Macro,0.9949494949494949
ACC Macro,0.98989898989899
F1 Macro,0.0
FPR Micro,0.005076142131979711
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9949238578680203
Spearman,0.6654792826221397

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

JCZuurmond

@tintinrevient : I did a first review. Could you split the PR? It contains many changes. Please create a PR for:

Introducing AWS
Introducing the Dataset.from_path
Changing the model Manifest
And maybe more, I did not get further

JCZuurmond · 2025-07-29T06:05:11Z

-    model_toml_file: str = typer.Option(help="Path to the model TOML file"),
-    nogpu: bool = typer.Option(False, help="GPUs available"),
+    dataset_zip_file: str = typer.Option(
+        default="", help="Path to the dataset ZIP file"


This option is required, right? Also, could you update the syntax to the annotated version where option is at the left side of the equals? And update both types to Path

Suggested change

default="", help="Path to the dataset ZIP file"

help="Path to the dataset ZIP file"

The option is not required. because in AWS, there are no paths passed from a user input to use a local file path.

JCZuurmond · 2025-07-29T06:05:51Z

-    manifest = Manifest.from_path(dataset_toml_file)
-    dataset_name = manifest.name
-    dataset = manifest.ingest()
+    dataset_zip_file = dataset_zip_file or training_data_path


Why introduce this or?

Because in AWS environment, there is no dataset_file passed by a user, and SageMaker training job automatically mounted the S3 path in the fixed location inside the container.

JCZuurmond · 2025-07-29T06:06:59Z

+    dataset = Dataset.from_path(dataset_zip_file)
+    dataset_name = dataset.name
+
+    model_toml_file = model_toml_file or manifest_path


Similar question about the or statement

JCZuurmond · 2025-07-29T06:10:25Z

@@ -1,10 +1,10 @@
 import polars as pl


Similar comments as for the other script

JCZuurmond · 2025-07-29T06:10:58Z

 import toml


 class Manifest(BaseModel):


The manifest should probably go into the pg2-benchmark package

I've put them into the pg2-benchmark! It is a good point, as in the future, we will update it with model cards, so it is sensible to put it in pg2-benchmark, 🤔

tintinrevient requested a review from JCZuurmond July 22, 2025 09:09

tintinrevient added 10 commits July 22, 2025 11:11

Add boto3

42cc7b3

Update dvc.yaml without env variables

357c0b3

Use Path object and return string for AWS types

23c57fb

Merge branch 'refactor/build-docker-images-with-configured-paths' int…

62210eb

…o feat/add-dvc-aws-yaml

Use Path object and return string for AWS types

45f854a

Merge branch 'refactor/build-docker-images-with-configured-paths' int…

ce5c5d1

…o feat/add-dvc-aws-yaml

Merge branch 'refactor/split-dvc-yaml-into-two-games' into feat/add-d…

aecf723

…vc-aws-yaml

Update supervised DVC without env variables which DVC cannot read

6a39599

Use placeholders for AWS account_id and region_name

f64c2c6

Test IFF AWS integration

ffac1d0

karel-w reviewed Jul 23, 2025

View reviewed changes

Comment thread README.md Outdated

karel-w reviewed Jul 23, 2025

View reviewed changes

Comment thread supervised/dvc.yaml Outdated

karel-w reviewed Jul 23, 2025

View reviewed changes

Comment thread supervised/dvc.yaml

karel-w reviewed Jul 23, 2025

View reviewed changes

Comment thread zero_shot/dvc.yaml Outdated

tintinrevient commented Jul 24, 2025

View reviewed changes

Comment thread src/pg2_benchmark/cli/aws.py Outdated

tintinrevient commented Jul 24, 2025

View reviewed changes

tintinrevient added 5 commits July 24, 2025 13:49

Resolve merge conflict

0fc7c2c

Update README.md

f01ad68

Update file paths for local and AWS env

d8837e0

Merge with the latest main

f992326

Merge with the latest main

464c6d5

tintinrevient changed the base branch from refactor/build-docker-images-with-configured-paths to main July 24, 2025 16:18

Update dvc.aws.yaml

03d9c43

tintinrevient mentioned this pull request Jul 28, 2025

Build docker images with configurable paths for AWS #73

Closed

tintinrevient added 2 commits July 28, 2025 19:11

Use dataset.from_path to load zip dataset

02d629f

Merge branch 'main' into feat/add-dvc-aws-yaml

a203f93

JCZuurmond reviewed Jul 29, 2025

View reviewed changes

tintinrevient marked this pull request as draft July 29, 2025 16:29

tintinrevient closed this Jul 31, 2025

tintinrevient deleted the feat/add-dvc-aws-yaml branch August 28, 2025 12:26

	default="", help="Path to the dataset ZIP file"
	help="Path to the dataset ZIP file"

Conversation

tintinrevient commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karel-w Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tintinrevient commented Jul 24, 2025

Uh oh!

tintinrevient commented Jul 28, 2025

Uh oh!

tintinrevient commented Jul 28, 2025

Uh oh!

JCZuurmond left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JCZuurmond Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tintinrevient commented Jul 22, 2025 •

edited

Loading

karel-w Jul 23, 2025 •

edited

Loading

JCZuurmond Jul 29, 2025 •

edited

Loading