Unable to reproduce example results using the Unbabel/wmt22-comet-da model

**Description:**
Hi COMET team,
I'm trying to reproduce the example results for the Unbabel/wmt22-comet-da model, but the output scores I obtained are quite different from those shown in the official documentation and example scripts.

**Environment**
Python version: 3.10.8
COMET version: 2.2.7
PyTorch version: 2.8.0
Operating system: Windows 11
without GPU

**Code used**
`from evaluate import load
comet_metric = load('comet')
source = ["Dem Feuer konnte Einhalt geboten werden", "Schulen und Kindergärten wurden eröffnet."]
hypothesis = ["The fire could be stopped", "Schools and kindergartens were open"]
reference = ["They were able to control the fire", "Schools and kindergartens opened"]
results = comet_metric.compute(predictions=hypothesis, references=reference, sources=source)
print([round(v, 2) for v in results["scores"]])`

**Expected behavior**
According to the example in the metrics/comet/README.md, the expected output should be [0.19, 0.92], but my result is [0.85, 0.97].
I used the same code as in the example and have loaded the correct model.

> Expected results

<img width="1022" height="233" alt="Image" src="https://github.com/user-attachments/assets/d23e7572-8f12-47f5-9fb8-d7f486606413" />

> My results

<img width="1792" height="229" alt="Image" src="https://github.com/user-attachments/assets/d1bbea62-56c2-478d-a7af-e8259603a50d" />

<img width="1686" height="114" alt="Image" src="https://github.com/user-attachments/assets/5bf5dd39-1f63-4956-bd8b-cd733f06a260" />


**Questions**
Has the Unbabel/wmt22-comet-da model been updated or rescaled recently?
Are there any changes in score normalization (e.g., z-score → 0–1 scaling)?
Is there a specific COMET or PyTorch version required to match the example results?

**Additional context**
I am using COMET for machine translation evaluation and need to ensure consistency with the official WMT22 scores. Any guidance or clarification would be greatly appreciated.
Thanks a lot for your time and support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to reproduce example results using the Unbabel/wmt22-comet-da model #711

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to reproduce example results using the Unbabel/wmt22-comet-da model #711

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions