feat: added huggingface-regressor #122 by florisvdf · Pull Request #132 · ProteinGym/proteingym-benchmark

florisvdf · 2025-10-14T13:51:25Z

Changes

Resolves #122

Please include a summary of the changes and the related issue. Please also
include relevant motivation and context. List any dependencies that are required
for this change.

Summary

Added RITA regressor implemented in proteingym.models.hfregressor. This module could easily be extended to use other huggingface hosted PLMs by implementing a dedicated Embedder class to models/huggingface-regressor/src/proteingym/models/hfregressor/embedders, and by updating the model card to specifiy the name of the PLM. There is currently no support yet for extra features and using precomputed embeddings.

Checklist

I broke the PR down so that it contains a reasonable amount of changes for an effective review
I performed a self-review of my code. Amongst other things, I have commented my code in hard-to-understand areas.
I made corresponding changes to the documentation
I added tests that prove my fix is effective or that my feature works
I accounted for dependent changes to be merged and published in downstream modules

tintinrevient · 2025-10-14T14:00:11Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885
Spearman,0.011855849117089201

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010010001992985675
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010010006100605498
Bennett S,-0.00010010010010010009
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010011004094695072
Scott PI,-0.00010011004094695072
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286157441352454
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286157441352454
KL Divergence,None
Lambda B,1.0
Lambda A,0.9997997997997998
Chi-Squared DF,99800100
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,1.401066496965464e-05
Overall MCEN,1.401066496965464e-05
Overall MCC,0.0
RR,0.5000500450405365
CBA,0.0
AUNU,None
AUNP,None
RCI,0.9999680897179217
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,0.0
TNR Micro,0.9998998999380064
TNR Macro,0.9998999099189271
Bangdiwala B,None
Krippendorff Alpha,-1.9957876399588722e-08
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Very Strong
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010009008107292328
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999099951021
ACC Macro,0.9997998199140291
F1 Macro,0.0
FPR Micro,0.0001001000619935688
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9998998999380064
Spearman,

florisvdf · 2025-10-15T09:14:06Z

dvc step of the CI is failing, I suspect due to how the prediction dataframe is structured, which causes

proteingym-benchmark/benchmark/supervised/local/dvc.yaml

Line 37 in 2b24c93

    
           cmd: uv run proteingym-benchmark metric calc --output-path ${output.prediction}/${item.dataset.name}_${item.model.name}.csv --metric-path ${output.metric}/${item.dataset.name}_${item.model.name}.csv

to fail.

This is unfortunately not apparent from the logs. @tintinrevient, do you know how I could check where the error happens?

tintinrevient · 2025-10-15T14:43:08Z

dvc step of the CI is failing, I suspect due to how the prediction dataframe is structured, which causes

proteingym-benchmark/benchmark/supervised/local/dvc.yaml

Line 37 in 2b24c93

cmd: uv run proteingym-benchmark metric calc --output-path ${output.prediction}/${item.dataset.name}_${item.model.name}.csv --metric-path ${output.metric}/${item.dataset.name}_${item.model.name}.csv

to fail.
This is unfortunately not apparent from the logs. @tintinrevient, do you know how I could check where the error happens?

@florisvdf you can check here: https://github.com/ProteinGym/proteingym-benchmark/actions/runs/18523420137/job/52788481310

A reminder is that this repo is under refactoring, the ways to structure dvc.yaml will change this or next week.

tintinrevient · 2025-10-15T20:25:15Z

I'll review it tomorrow. (a bit wrapping up on other PRs.)

tintinrevient · 2025-10-16T09:29:16Z

+    if Path(SageMakerTrainingJobPath.OUTPUT_PATH).is_dir():
+        df.write_csv(
+            f"{SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv"
+        )
+
+        console.print(
+            f"Saved the metrics in CSV in {SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv"
+        )
+    else:
+        console.print(f"Predictions:\n {df}")


This path checking can be simplified to:

df.to_csv( f"{SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv", index=False, ) console.print( f"Saved the metrics in CSV in {SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv" )

The binding of paths are defined and used in DVC here: https://github.com/ProteinGym/proteingym-benchmark/blob/65bedcdb5f3286f2a17ef4abc6cc3cc78c528175/benchmark/supervised/local/dvc.yaml#L22C204-L22C250.

tintinrevient · 2025-10-16T09:30:52Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885
Spearman,0.011855849117089201

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

tintinrevient

@florisvdf GREAT WORK!!! It is a really neat PR. I saw it passed the CML (continuous machine learning) CI (currently, maybe the image name needs to be lowercase: https://github.com/ProteinGym/proteingym-benchmark/actions/runs/18556840699/job/52896246260)

I just add another comment for the paths in __main__.py entrypoint, and I've approved the PR. When the pipeline passes, you can merge it.

tintinrevient · 2025-10-16T09:40:24Z

@florisvdf how is your experience using DVC, template model Dockerfile and everything? You can leave comments below.

tintinrevient · 2025-10-16T10:36:29Z

I see the docker fails to run, you can debug it locally to see what is the error message using dvc repro benchmark/supervised/local ...

tintinrevient · 2025-10-16T13:34:31Z

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885
Spearman,0.011855849117089201

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

florisvdf · 2025-10-16T13:37:57Z

I just managed to pass the CI. I made an update in the model card and pushed it, but that didn't trigger the workflow so I had to run it manually (not sure why that happened?).

I changed RITA_xl in the model card to RITA_s, which is a much smaller model. I suspect that the CI failed because a 2B parameter model was too large for the CI runner. Changing it to RITA_s worked, but it still took 30 min which I found a bit strange. I think we can merge now @tintinrevient.

florisvdf · 2025-10-16T13:46:55Z

@florisvdf how is your experience using DVC, template model Dockerfile and everything? You can leave comments below.

Overall very good. Your tip to run dvc repro benchmark/supervised/local was very helpful. Perhaps steps to follow how to pass the CI could be listed a bit more explicitly in the docs, and also for instance how users should update the workflow to include their model in the CI. Thanks for your help @tintinrevient !

florisvdf · 2025-10-16T13:55:51Z

Not sure why workflow isn't triggered anymore when I push changes?

tintinrevient · 2025-10-16T14:48:37Z

Not sure why workflow isn't triggered anymore when I push changes?

It has merge conflict with the main branch, so it is not triggered? Because I've updated the main branch this morning. When the conflicts are resolved, we can merge.

tintinrevient · 2025-10-16T14:54:20Z

Perhaps steps to follow how to pass the CI could be listed a bit more explicitly in the docs, and also for instance how users should update the workflow to include their model in the CI.

I've added to the backlog (@JCZuurmond): #135

Co-authored-by: Shushi <zhaobenben007@googlemail.com>

…fregressor

florisvdf · 2025-10-16T15:12:28Z

It seems that uv is no longer installed after rebasing to main branch:
/home/runner/work/_temp/2a033953-c95b-45ad-8c56-6e9429818621.sh: line 3: uv: command not found. Do you know why @tintinrevient ?

tintinrevient · 2025-10-22T08:24:31Z

It seems that uv is no longer installed after rebasing to main branch: /home/runner/work/_temp/2a033953-c95b-45ad-8c56-6e9429818621.sh: line 3: uv: command not found. Do you know why @tintinrevient ?

uv is not used in the main branch anymore, in dvc.yaml, just use dvc repro instead of uv run dvc repro. The python environment will be created in the CI pipeline.

tintinrevient · 2025-10-22T08:25:21Z

You can reference this cml.yaml:

proteingym-benchmark/.github/workflows/cml.yaml

Line 41 in 7ef4eb2

dvc repro benchmark/supervised/local/dvc.yaml

tintinrevient · 2025-10-23T08:37:49Z

@florisvdf, I've updated the cml.yaml file to be in sync with the main branch. And now in order to choose which model in which game, we can use tags to make it more flexible as proteingym-base list-models models | jq '[.[] | select(.tags | contains(["zero-shot"]))]

tintinrevient · 2025-10-23T09:01:28Z

✅ Supervised models have all passed validation.

metric_name,metric_value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885

✅ Zero-shot models have all passed validation.

metric_name,metric_value
Overall ACC,0.0
Overall RACCU,0.00010350554262465215
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010027040554341335
Bennett S,-0.0001002707309736288
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010351625713101698
Scott PI,-0.00010351625713101698
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.270402713018697
Cross Entropy,0
Joint Entropy,12.286557761608659
Conditional Entropy,0.0161550485899576
Mutual Information,12.2704027130187
KL Divergence,None
Lambda B,0.9963963963963964
Lambda A,1.0
Chi-Squared DF,99460729
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.0038030424339471577
P-Value,1
Overall CEN,0.0005655020094343343
Overall MCEN,0.0005655020094343343
Overall MCC,0.0
RR,0.5009023460998596
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000000000000002
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,0.0
TNR Micro,0.9998997292690264
TNR Macro,0.9998997393222379
Bangdiwala B,None
Krippendorff Alpha,-3.425833166131969e-06
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Very Strong
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010026067776214287
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998997393222379
ACC Macro,0.9997994786444756
F1 Macro,0.0
FPR Micro,0.00010027073097362837
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9998997292690264

tintinrevient · 2025-10-23T09:02:35Z

It takes 15 minutes to run 3 models against 3 datasets, pairwise, roughly.

florisvdf · 2025-10-23T09:24:27Z

@florisvdf, I've updated the cml.yaml file to be in sync with the main branch. And now in order to choose which model in which game, we can use tags to make it more flexible as proteingym-base list-models models | jq '[.[] | select(.tags | contains(["zero-shot"]))]

Thanks for helping me along wrapping this up!

florisvdf requested a review from tintinrevient October 15, 2025 09:04

tintinrevient reviewed Oct 16, 2025

View reviewed changes

Comment thread .github/workflows/cml.yaml Outdated

tintinrevient reviewed Oct 16, 2025

View reviewed changes

tintinrevient approved these changes Oct 16, 2025

View reviewed changes

Floris vanderFlier and others added 13 commits October 16, 2025 16:56

feat: added huggingface-regressor #122

9d3e229

ci: added RITA regressor to ci benchmark

a117b35

ci: changed model selection syntax

4fa60b2

fix: conform to prediction schema

5d0bf8e

fix: update .github/workflows/cml.yaml

97af170

Co-authored-by: Shushi <zhaobenben007@googlemail.com>

fix: syntax

c6e24b4

fix: truncated line in cml.yaml repaired

bf3a92c

refactor: RITA regressor -> RITARegressor, huggingface-regressor -> h…

f5ac5cc

…fregressor

fix: refactor RITA regressor -> RITARegressor in cml.yaml

859a87f

refactor: RITARegressor -> ritaregressor

381de7a

fix: added wheels

6c7ee1f

fix: RITA_s instead of RITA_xl for ci

8280083

revert: removed test and directory checking

be3dd66

Floris vanderFlier added 2 commits October 16, 2025 16:58

docs: updates to model card

b1b29b8

rebase onto main

fb24b4a

florisvdf force-pushed the feat/onboard-hf-regressor branch from 89d4375 to fb24b4a Compare October 16, 2025 15:00

Merge with the latest main

392ba0e

tintinrevient added 2 commits October 23, 2025 10:41

Rename to hyper_parameters

29b496e

Rename to hyper_parameters

444a20c

tintinrevient merged commit 43547f0 into main Oct 23, 2025
1 check passed

tintinrevient deleted the feat/onboard-hf-regressor branch October 23, 2025 09:03

Conversation

florisvdf commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary

Checklist

Uh oh!

tintinrevient commented Oct 14, 2025

Uh oh!

florisvdf commented Oct 15, 2025

Uh oh!

tintinrevient commented Oct 15, 2025

Uh oh!

tintinrevient commented Oct 15, 2025

Uh oh!

Uh oh!

tintinrevient Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tintinrevient commented Oct 16, 2025

Uh oh!

tintinrevient left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tintinrevient commented Oct 16, 2025

Uh oh!

tintinrevient commented Oct 16, 2025

Uh oh!

tintinrevient commented Oct 16, 2025

Uh oh!

florisvdf commented Oct 16, 2025

Uh oh!

florisvdf commented Oct 16, 2025

Uh oh!

florisvdf commented Oct 16, 2025

Uh oh!

tintinrevient commented Oct 16, 2025

Uh oh!

tintinrevient commented Oct 16, 2025

Uh oh!

florisvdf commented Oct 16, 2025

Uh oh!

tintinrevient commented Oct 22, 2025

Uh oh!

tintinrevient commented Oct 22, 2025

Uh oh!

tintinrevient commented Oct 23, 2025

Uh oh!

tintinrevient commented Oct 23, 2025

Uh oh!

tintinrevient commented Oct 23, 2025

Uh oh!

Uh oh!

florisvdf commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

florisvdf commented Oct 14, 2025 •

edited

Loading

tintinrevient Oct 16, 2025 •

edited

Loading

tintinrevient left a comment •

edited

Loading