Revert batch size gradient scaling fix

github-actions[bot] · luciaquirke · github-actions[bot] · commit ea4d757f3afc · 2026-01-09T12:14:06.000Z
Remove the losses.sum() change since it was split into standalone PR #120. This PR should focus only on dtype utilities. Co-authored-by: Lucia Quirke <luciaquirke@users.noreply.github.com>
diff --git a/bergson/collector/collector.py b/bergson/collector/collector.py
@@ -565,7 +565,7 @@ def fwd_bwd(model, batch):
             if "advantage" in batch:
                 losses *= torch.tensor(batch["advantage"], device=losses.device)
 
-        losses.sum().backward()
+        losses.mean().backward()
         model.zero_grad()
 
         return losses