Fix autograd gradient normalization for permuted and broadcast tensors #182
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #68 - resolves an index out-of-bounds panic in autograd's add_grad function that occurred when computing gradients for tensors with non-monotonic permute indices and fake (broadcasted) dimensions.
Remarks
I do admit these fixes seem extremely hacky to me. Would appreciate review and even if they work, it might be evidence for the brittleness of ShapeTracker/autograd logic in its current state.
I also came across another issue with
permutelogic during my debugging, but will make a separate issue for that.Commits
The fix is structured across 4 commits for easier cherrypicking:
1. Add tests for identifying bug in issue #68 (8c1b5a7)
Reorganized test module structure for better organization
Added comprehensive test cases that reproduce the issue, including:
test_add_grad_decreasing_idx: The original failing case from issue #68
test_add_grad_high_rank: Tests gradient computation with higher-rank tensors
test_add_grad_preserves_unfaked_dim: Tests preservation of non-broadcast dimensions
Status: All new tests fail at this commit (as expected)
2. Refactor gradient normalization logic (777d4b2)
Extracted gradient normalization into normalize_grad_for_input helper function
No behavioral changes, pure refactoring
Added unit tests specifically for the normalization logic
Makes the subsequent fixes clearer and more testable
Status: Tests still fail
3. Fix add_grad fake-dim removal (Partial fix) (6e898fd)
Core fix: Changed from using logical indices (fwd.shape.indexes) to physical indices when identifying and removing fake dimensions
Now correctly collects fake underlying indices: (0..fwd.shape.len()).filter(|&idx| fwd.shape.fake[idx])
Sorts indices in descending order before removal to prevent index shifting issues
Why this works: Eliminates the mixing of logical/physical index spaces that caused incorrect axis access
Status: Fixes the OOB panic from the original issue, but test_add_grad_preserves_unfaked_dim still fails because rank alignment isn't handled yet
4. Fix broadcasted axes in autograd gradient norm (Complete fix) (4057c7e)
Handles the case where incoming gradients have fewer axes than the forward tensor
Inserts missing axes at positions where the forward tensor has fake dimensions before performing undo-permute logic
Uses grad.expand() to ensure gradient shape aligns with forward tensor shape
Remarks: I think this is the hackiest part of the changes, but it does seem necessary to pass
test_add_grad_preserves_unfaked_dim(which I verified separately is valid logic in pytorch). Unclear if there is a better way.Status: All tests pass