Fix gradient scaling bug with batch size by claude[bot] · Pull Request #120 · EleutherAI/bergson

claude · 2026-01-09T12:10:24Z

Summary

Fixes gradient scaling bug where gradient magnitudes varied based on batch size.

The Problem: When using losses.mean().backward(), the gradients were being scaled by 1/batch_size. This meant that processing the same data in different batch configurations would produce gradients with different magnitudes, leading to inconsistent gradient scales (issue #112).

The Fix: Change to losses.sum().backward() so that gradients are invariant to batch size. The gradient magnitude now depends only on the data, not on how it's batched.

Changes

bergson/collector/collector.py:568: Changed losses.mean().backward() to losses.sum().backward()
tests/test_batch_size_invariance.py: Added test based on the replication script from issue Numerical instability: outlier gradients leading to non-reproducible results #112 to verify gradient scales are now consistent

Test Plan

The new test test_batch_size_invariance.py verifies that:

Gradient standard deviations are consistent (within 20% tolerance) when processing data separately vs together
Gradients point in the same direction (high cosine similarity)

This test is based on David's replication script from the data-args branch that demonstrated the original bug.

Fixes #112

🤖 Generated with Claude Code

Change loss reduction from mean to sum in backward pass to make gradient scales invariant to batch size. This ensures that gradients have consistent magnitudes regardless of how the data is batched. Add test based on issue #112 replication script to verify that gradient scales are now consistent when processing data separately vs together. Fixes #112 Co-authored-by: Lucia Quirke <luciaquirke@users.noreply.github.com>

Remove the losses.sum() change since it was split into standalone PR #120. This PR should focus only on dtype utilities. Co-authored-by: Lucia Quirke <luciaquirke@users.noreply.github.com>

claude · 2026-01-09T12:14:26Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

claude · 2026-01-09T12:16:57Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

for more information, see https://pre-commit.ci

luciaquirke

Good job claude

Remove the losses.sum() change since it was split into standalone PR #120. This PR should focus only on dtype utilities. Co-authored-by: Lucia Quirke <luciaquirke@users.noreply.github.com>

claude bot mentioned this pull request Jan 9, 2026

Use dtype utils #117

Merged

luciaquirke force-pushed the fix-batch-size-gradient-scaling branch from 13d720f to f908d2c Compare January 9, 2026 12:34

luciaquirke self-requested a review January 9, 2026 12:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

95c057d

for more information, see https://pre-commit.ci

luciaquirke force-pushed the fix-batch-size-gradient-scaling branch from f908d2c to 95c057d Compare January 9, 2026 12:38

luciaquirke approved these changes Jan 9, 2026

View reviewed changes

luciaquirke force-pushed the fix-batch-size-gradient-scaling branch from b81b919 to fd6709f Compare January 9, 2026 12:52

Add CLAUDE.md

5e5751d

luciaquirke force-pushed the fix-batch-size-gradient-scaling branch from fd6709f to 5e5751d Compare January 9, 2026 12:53

luciaquirke merged commit 689f17b into main Jan 9, 2026
6 checks passed

luciaquirke deleted the fix-batch-size-gradient-scaling branch January 13, 2026 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gradient scaling bug with batch size#120

Fix gradient scaling bug with batch size#120
luciaquirke merged 3 commits intomainfrom
fix-batch-size-gradient-scaling

claude bot commented Jan 9, 2026

Uh oh!

claude bot commented Jan 9, 2026

Uh oh!

claude bot commented Jan 9, 2026

Uh oh!

luciaquirke left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claude bot commented Jan 9, 2026

Summary

Changes

Test Plan

Uh oh!

claude bot commented Jan 9, 2026

Code review

Uh oh!

claude bot commented Jan 9, 2026

Code review

Uh oh!

luciaquirke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant