Fix benchmark test: use "class" groupby for source-level eval

PauBadiaM · claude · PauBadiaM · commit 54b4453a799e · 2026-04-12T23:05:37.000-07:00
The index-alignment fix in _tensor_truth exposed that metrics5 was
accidentally passing due to misaligned ground truth. With groupby="group",
every source within each group has homogeneous perturbation (all 1s or
all 0s), producing no metrics. Using "class" (which mixes groups A and B)
ensures heterogeneous ground truth for meaningful source-level evaluation.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/tests/bm/test_benchmark.py b/tests/bm/test_benchmark.py
@@ -13,7 +13,7 @@
         [["auc"], None, "expr", False, 0.05, 5, False],
         [["auc", "fscore"], "group", "expr", False, 0.05, 5, False],
         [["auc", "fscore", "qrank"], None, "source", False, 0.05, 2, False],
-        [["auc", "fscore", "qrank"], "group", "source", False, 0.05, 1, False],
+        [["auc", "fscore", "qrank"], "class", "source", False, 0.05, 1, False],
         [["auc", "fscore", "qrank"], "bm_group", "expr", True, 0.05, 5, False],
         [["auc", "fscore", "qrank"], "source", "expr", True, 0.05, 5, False],
     ],