Skip to content

pymarian-eval huge inconsistency of COMET scores across CPU and GPU #1034

@TudorMN

Description

@TudorMN

Bug description

The results of the COMET metric differ significantly when running pymarian-eval on CPU versus GPU. To me this inconsistency suggests a potential bug in the implementation/calculation of the metric.

How to reproduce

  • Files used

es.mt.txt
en.src.txt
es.ref.txt

  • Commands
  1. GPU command used:

pymarian-eval -m wmt22-comet-da -l comet -t es.mt.txt -s en.src.txt -r es.ref.txt -o {OUTPUT}.cpl -d 1

  1. CPU command used:

pymarian-eval -m wmt22-comet-da -l comet -t es.mt.txt -s en.src.txt -r es.ref.txt -o {OUTPUT}.cpl -c 8

  • Results (rounded) for the same sample of three sentences
  1. GPU results: 0.95, 0.85, 0.71
  2. CPU results: 0.37, 0.85, 0.31

Context

  • pymarian-eval version (installed with pip) obtained from pymarian-eval --version: 1.12.31

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions