Add WER metric by nikolasavic3 · Pull Request #30 · google/metrax

nikolasavic3 · 2025-03-07T21:29:11Z

Add Word Error Rate (WER) metric
Fixes #28

src/metrax/metrics.py

src/metrax/metrics_test.py

jshin1394 · 2025-03-10T18:58:22Z

src/metrax/metrics.py

+    total_edit_distance = 0
+    total_reference_length = 0
+
+    for pred, ref in zip(predictions, references):


should we add a TODO to use a more involved tokenizer in the future?

That's a great point! To keep things in line with the single responsibility principle, I think we should remove the tokenization logic entirely and just accept pre-tokenized lists of strings as input, leaving the tokenizer implementation to the one who calls the metric.

The metric now accepts only lists of strings. Do you think we should accept strings and implement a tokenizer?

Hi Nikola! I agree that single responsibility principle is important. However, considering that it is more common for these types of metrics to take in a whole sentence, (for instance, this torchMetrics WER) let's keep the implementation such that we can take in sentences as input.

jshin1394 · 2025-03-10T19:27:41Z

Thank you Nikola for your PR! Left some comments. Feel free to sync the workspace as well! :)

src/metrax/metrics.py

nikolasavic3 · 2025-03-10T22:32:38Z

Thank you Jiwon for reviewing my PR!

jshin1394 · 2025-03-10T22:57:15Z

Thank you Nikola! LGTM given all the comments are resolved.

jshin1394 · 2025-03-11T22:06:03Z

Hi nikola, a bunch of PRs were submitted to split the metrics.py file into multiple modules. I think the WER class can be placed under the new nlp_metrics.pymodule.

jshin1394 · 2025-03-13T21:45:44Z

src/metrax/nlp_metrics.py

+  def total_reference_length(self):
+    return self.count
+
+  @classmethod


no need to override this function as it already exists in Average

jshin1394 · 2025-03-13T21:46:15Z

src/metrax/nlp_metrics.py

+
+    return distance_matrix[m][n]
+
+  def merge(self, other: 'WER') -> 'WER':


no need to override this function anymore as it exists in Average

jshin1394 · 2025-03-13T21:52:36Z

src/metrax/nlp_metrics.py

+        count=self.count + other.count,
+    )
+
+  def compute(self) -> jax.Array:


no need to override this function as it exists in Average

jshin1394 · 2025-03-13T21:52:49Z

src/metrax/nlp_metrics.py

+      total_edit_distance: Sum of edit distances across all samples.
+      total_reference_length: Sum of reference lengths across all samples.
+  """
+  @property


these can be removed as well since

jshin1394 · 2025-03-13T21:54:43Z

Thank you so much nikola! Almost there :) I left some comments so that we can 1) remove functions that already exist as part of Average and 2) the metric can take in sentence as input following other popular metric libraries such as torchMetrics.

Thank you again Nikola!

nikolasavic3 · 2025-03-14T19:03:01Z

Thank you for the detailed feedback, Jiwon! I've removed all the redundant function overrides (merge, compute, total_reference_length) and updated the implementation to accept untokenized strings.

jshin1394 · 2025-03-14T21:26:34Z

Thank you so much Nikola! We really appreciate your contribution to the Metrax codebase! :)

nikolasavic3 · 2025-03-15T09:14:17Z

Thank you so much, Jiwon! I really appreciate your detailed feedback and guidance throughout this process. Collaborating with you on adding the WER metric to the Metrax codebase was great. Looking forward to contributing more in the future!

nikolasavic3 force-pushed the feat-wer branch from 5cbea54 to 625bf66 Compare March 7, 2025 21:34

nikolasavic3 changed the title ~~Feat wer~~ Add WER metric Mar 7, 2025

nikolasavic3 commented Mar 7, 2025

View reviewed changes

src/metrax/metrics.py Outdated Show resolved Hide resolved

nikolasavic3 force-pushed the feat-wer branch from 625bf66 to 8f42b13 Compare March 9, 2025 12:56