Skip to content

feat: added huggingface-regressor #122#132

Merged
tintinrevient merged 18 commits intomainfrom
feat/onboard-hf-regressor
Oct 23, 2025
Merged

feat: added huggingface-regressor #122#132
tintinrevient merged 18 commits intomainfrom
feat/onboard-hf-regressor

Conversation

@florisvdf
Copy link
Copy Markdown
Contributor

@florisvdf florisvdf commented Oct 14, 2025

Changes

Resolves #122

Please include a summary of the changes and the related issue. Please also
include relevant motivation and context. List any dependencies that are required
for this change.

Summary

Added RITA regressor implemented in proteingym.models.hfregressor. This module could easily be extended to use other huggingface hosted PLMs by implementing a dedicated Embedder class to models/huggingface-regressor/src/proteingym/models/hfregressor/embedders, and by updating the model card to specifiy the name of the PLM. There is currently no support yet for extra features and using precomputed embeddings.

Checklist

  • I broke the PR down so that it contains a reasonable amount of changes for an effective review
  • I performed a self-review of my code. Amongst other things, I have commented my code in hard-to-understand areas.
  • I made corresponding changes to the documentation
  • I added tests that prove my fix is effective or that my feature works
  • I accounted for dependent changes to be merged and published in downstream modules

@tintinrevient
Copy link
Copy Markdown
Contributor

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885
Spearman,0.011855849117089201

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010010001992985675
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010010006100605498
Bennett S,-0.00010010010010010009
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010011004094695072
Scott PI,-0.00010011004094695072
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286157441352454
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286157441352454
KL Divergence,None
Lambda B,1.0
Lambda A,0.9997997997997998
Chi-Squared DF,99800100
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,1.401066496965464e-05
Overall MCEN,1.401066496965464e-05
Overall MCC,0.0
RR,0.5000500450405365
CBA,0.0
AUNU,None
AUNP,None
RCI,0.9999680897179217
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,0.0
TNR Micro,0.9998998999380064
TNR Macro,0.9998999099189271
Bangdiwala B,None
Krippendorff Alpha,-1.9957876399588722e-08
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Very Strong
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010009008107292328
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999099951021
ACC Macro,0.9997998199140291
F1 Macro,0.0
FPR Micro,0.0001001000619935688
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9998998999380064
Spearman,

@florisvdf
Copy link
Copy Markdown
Contributor Author

dvc step of the CI is failing, I suspect due to how the prediction dataframe is structured, which causes

cmd: uv run proteingym-benchmark metric calc --output-path ${output.prediction}/${item.dataset.name}_${item.model.name}.csv --metric-path ${output.metric}/${item.dataset.name}_${item.model.name}.csv

to fail.

This is unfortunately not apparent from the logs. @tintinrevient, do you know how I could check where the error happens?

@tintinrevient
Copy link
Copy Markdown
Contributor

dvc step of the CI is failing, I suspect due to how the prediction dataframe is structured, which causes

cmd: uv run proteingym-benchmark metric calc --output-path ${output.prediction}/${item.dataset.name}_${item.model.name}.csv --metric-path ${output.metric}/${item.dataset.name}_${item.model.name}.csv

to fail.
This is unfortunately not apparent from the logs. @tintinrevient, do you know how I could check where the error happens?

@florisvdf you can check here: https://github.com/ProteinGym/proteingym-benchmark/actions/runs/18523420137/job/52788481310

A reminder is that this repo is under refactoring, the ways to structure dvc.yaml will change this or next week.

@tintinrevient
Copy link
Copy Markdown
Contributor

I'll review it tomorrow. (a bit wrapping up on other PRs.)

Comment thread .github/workflows/cml.yaml Outdated
Comment on lines +96 to +105
if Path(SageMakerTrainingJobPath.OUTPUT_PATH).is_dir():
df.write_csv(
f"{SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv"
)

console.print(
f"Saved the metrics in CSV in {SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv"
)
else:
console.print(f"Predictions:\n {df}")
Copy link
Copy Markdown
Contributor

@tintinrevient tintinrevient Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path checking can be simplified to:

df.to_csv(
    f"{SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv",
    index=False,
)

console.print(
    f"Saved the metrics in CSV in {SageMakerTrainingJobPath.OUTPUT_PATH}/{dataset.name}_{model_card.name}.csv"
)

The binding of paths are defined and used in DVC here: https://github.com/ProteinGym/proteingym-benchmark/blob/65bedcdb5f3286f2a17ef4abc6cc3cc78c528175/benchmark/supervised/local/dvc.yaml#L22C204-L22C250.

@tintinrevient
Copy link
Copy Markdown
Contributor

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885
Spearman,0.011855849117089201

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

Copy link
Copy Markdown
Contributor

@tintinrevient tintinrevient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florisvdf GREAT WORK!!! It is a really neat PR. I saw it passed the CML (continuous machine learning) CI (currently, maybe the image name needs to be lowercase: https://github.com/ProteinGym/proteingym-benchmark/actions/runs/18556840699/job/52896246260)

I just add another comment for the paths in __main__.py entrypoint, and I've approved the PR. When the pipeline passes, you can merge it.

@tintinrevient
Copy link
Copy Markdown
Contributor

@florisvdf how is your experience using DVC, template model Dockerfile and everything? You can leave comments below.

@tintinrevient
Copy link
Copy Markdown
Contributor

I see the docker fails to run, you can debug it locally to see what is the error message using dvc repro benchmark/supervised/local ...

@tintinrevient
Copy link
Copy Markdown
Contributor

✅ Supervised models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885
Spearman,0.011855849117089201

✅ Zero-shot models have all passed validation.

Metric,Value
Overall ACC,0.0
Overall RACCU,0.00010007998789141575
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010009004298543876
Bennett S,-0.00010009008107296567
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010009000489789399
Scott PI,-0.00010009000489789399
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.286549508613042
Cross Entropy,0
Joint Entropy,12.286549508613042
Conditional Entropy,-0.0
Mutual Information,12.286557761608659
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,99820081
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.00020016012810248197
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000006717097922
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.999899909957026
TNR Macro,0.9998999199359487
Bangdiwala B,None
Krippendorff Alpha,7.616744806910965e-11
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010008006405126668
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998999200121163
ACC Macro,0.999799839948065
F1 Macro,0.0
FPR Micro,0.00010009004297395485
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.999899909957026
Spearman,

@florisvdf
Copy link
Copy Markdown
Contributor Author

I just managed to pass the CI. I made an update in the model card and pushed it, but that didn't trigger the workflow so I had to run it manually (not sure why that happened?).

I changed RITA_xl in the model card to RITA_s, which is a much smaller model. I suspect that the CI failed because a 2B parameter model was too large for the CI runner. Changing it to RITA_s worked, but it still took 30 min which I found a bit strange. I think we can merge now @tintinrevient.

@florisvdf
Copy link
Copy Markdown
Contributor Author

@florisvdf how is your experience using DVC, template model Dockerfile and everything? You can leave comments below.

Overall very good. Your tip to run dvc repro benchmark/supervised/local was very helpful. Perhaps steps to follow how to pass the CI could be listed a bit more explicitly in the docs, and also for instance how users should update the workflow to include their model in the CI. Thanks for your help @tintinrevient !

@florisvdf
Copy link
Copy Markdown
Contributor Author

Not sure why workflow isn't triggered anymore when I push changes?

@tintinrevient
Copy link
Copy Markdown
Contributor

Not sure why workflow isn't triggered anymore when I push changes?

It has merge conflict with the main branch, so it is not triggered? Because I've updated the main branch this morning. When the conflicts are resolved, we can merge.

@tintinrevient
Copy link
Copy Markdown
Contributor

Perhaps steps to follow how to pass the CI could be listed a bit more explicitly in the docs, and also for instance how users should update the workflow to include their model in the CI.

I've added to the backlog (@JCZuurmond): #135

@florisvdf florisvdf force-pushed the feat/onboard-hf-regressor branch from 89d4375 to fb24b4a Compare October 16, 2025 15:00
@florisvdf
Copy link
Copy Markdown
Contributor Author

It seems that uv is no longer installed after rebasing to main branch:
/home/runner/work/_temp/2a033953-c95b-45ad-8c56-6e9429818621.sh: line 3: uv: command not found. Do you know why @tintinrevient ?

@tintinrevient
Copy link
Copy Markdown
Contributor

It seems that uv is no longer installed after rebasing to main branch: /home/runner/work/_temp/2a033953-c95b-45ad-8c56-6e9429818621.sh: line 3: uv: command not found. Do you know why @tintinrevient ?

uv is not used in the main branch anymore, in dvc.yaml, just use dvc repro instead of uv run dvc repro. The python environment will be created in the CI pipeline.

@tintinrevient
Copy link
Copy Markdown
Contributor

You can reference this cml.yaml:

dvc repro benchmark/supervised/local/dvc.yaml

@tintinrevient
Copy link
Copy Markdown
Contributor

@florisvdf, I've updated the cml.yaml file to be in sync with the main branch. And now in order to choose which model in which game, we can use tags to make it more flexible as proteingym-base list-models models | jq '[.[] | select(.tags | contains(["zero-shot"]))]

@tintinrevient
Copy link
Copy Markdown
Contributor

✅ Supervised models have all passed validation.

metric_name,metric_value
Overall ACC,0.0
Overall RACCU,0.0030303030303030303
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.0030395136778115636
Bennett S,-0.00303951367781155
Kappa Standard Error,0.0
Kappa Unbiased,-0.00303951367781155
Scott PI,-0.00303951367781155
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,7.366322214245807
Reference Entropy,7.366322214245807
Cross Entropy,0
Joint Entropy,7.366322214245807
Conditional Entropy,-0.0
Mutual Information,7.366322214245807
KL Divergence,None
Lambda B,1.0
Lambda A,1.0
Chi-Squared DF,108241
Overall J,"(0.0, 0.0)"
Hamming Loss,1.0
Zero-one Loss,165
NIR,0.006060606060606061
P-Value,1
Overall CEN,0.0
Overall MCEN,0.0
Overall MCC,0.0
RR,0.5
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,None
TNR Micro,0.9969604863221885
TNR Macro,0.996969696969697
Bangdiwala B,None
Krippendorff Alpha,0.0
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Perfect
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00303030303030305
FNR Macro,None
PPV Macro,None
NPV Macro,0.996969696969697
ACC Macro,0.9939393939393939
F1 Macro,0.0
FPR Micro,0.003039513677811523
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9969604863221885

✅ Zero-shot models have all passed validation.

metric_name,metric_value
Overall ACC,0.0
Overall RACCU,0.00010350554262465215
Overall RACC,0.0
Kappa,0.0
Gwet AC1,-0.00010027040554341335
Bennett S,-0.0001002707309736288
Kappa Standard Error,0.0
Kappa Unbiased,-0.00010351625713101698
Scott PI,-0.00010351625713101698
Kappa No Prevalence,-1.0
Kappa 95% CI,"(0.0, 0.0)"
Standard Error,0.0
95% CI,"(0.0, 0.0)"
Chi-Squared,None
Phi-Squared,None
Cramer V,None
Response Entropy,12.286557761608659
Reference Entropy,12.270402713018697
Cross Entropy,0
Joint Entropy,12.286557761608659
Conditional Entropy,0.0161550485899576
Mutual Information,12.2704027130187
KL Divergence,None
Lambda B,0.9963963963963964
Lambda A,1.0
Chi-Squared DF,99460729
Overall J,"(0.0, 0.0)"
Hamming Loss,0.9999999999999999
Zero-one Loss,4996
NIR,0.0038030424339471577
P-Value,1
Overall CEN,0.0005655020094343343
Overall MCEN,0.0005655020094343343
Overall MCC,0.0
RR,0.5009023460998596
CBA,0.0
AUNU,None
AUNP,None
RCI,1.0000000000000002
Pearson C,None
TPR Micro,0.0
TPR Macro,None
CSI,None
ARI,0.0
TNR Micro,0.9998997292690264
TNR Macro,0.9998997393222379
Bangdiwala B,None
Krippendorff Alpha,-3.425833166131969e-06
SOA1(Landis & Koch),Slight
SOA2(Fleiss),Poor
SOA3(Altman),Poor
SOA4(Cicchetti),Poor
SOA5(Cramer),None
SOA6(Matthews),Negligible
SOA7(Lambda A),Perfect
SOA8(Lambda B),Very Strong
SOA9(Krippendorff Alpha),Low
SOA10(Pearson C),None
FPR Macro,0.00010026067776214287
FNR Macro,None
PPV Macro,None
NPV Macro,0.9998997393222379
ACC Macro,0.9997994786444756
F1 Macro,0.0
FPR Micro,0.00010027073097362837
FNR Micro,1.0
PPV Micro,0.0
F1 Micro,0.0
NPV Micro,0.9998997292690264

@tintinrevient
Copy link
Copy Markdown
Contributor

It takes 15 minutes to run 3 models against 3 datasets, pairwise, roughly.

@tintinrevient tintinrevient merged commit 43547f0 into main Oct 23, 2025
1 check passed
@tintinrevient tintinrevient deleted the feat/onboard-hf-regressor branch October 23, 2025 09:03
@florisvdf
Copy link
Copy Markdown
Contributor Author

@florisvdf, I've updated the cml.yaml file to be in sync with the main branch. And now in order to choose which model in which game, we can use tags to make it more flexible as proteingym-base list-models models | jq '[.[] | select(.tags | contains(["zero-shot"]))]

Thanks for helping me along wrapping this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add RITA Regressor

2 participants