fix(attention): fix wrong broadcast #52

shenxiangzhuang · 2025-11-14T10:10:34Z

Fixed #50

See hyunwoongko/transformer#40 for details.

codecov · 2025-11-14T10:13:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.42%. Comparing base (ce5e125) to head (9232eaf).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master      #52   +/-   ##
=======================================
  Coverage   94.42%   94.42%           
=======================================
  Files           9        9           
  Lines         520      520           
=======================================
  Hits          491      491           
  Misses         29       29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR fixes incorrect masking behavior in attention mechanisms by addressing two issues: replacing the arbitrary large negative value (-10000) with proper negative infinity (-inf) for masked positions, and correcting the broadcast dimensions of the target padding mask from (batch_size, 1, target_seq_length, 1) to (batch_size, 1, 1, target_seq_length) to properly combine with the causal mask.

Changed mask fill value from float("-10000") to float("-inf") in both transformer and BERT attention implementations
Fixed target mask shape generation to use correct unsqueeze dimensions for proper broadcasting
Updated comment to reflect corrected mask shape

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
toynlp/transformer/model.py	Fixed attention mask fill value and corrected target mask broadcast dimensions with updated comment
toynlp/bert/model.py	Fixed attention mask fill value and removed TODO comment

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fix(attention): fix wrong broadcast

9232eaf

shenxiangzhuang self-assigned this Nov 14, 2025

shenxiangzhuang added the bug Something isn't working label Nov 14, 2025

shenxiangzhuang requested a review from Copilot November 14, 2025 10:11

Copilot started reviewing on behalf of shenxiangzhuang November 14, 2025 10:11 View session

Copilot finished reviewing on behalf of shenxiangzhuang November 14, 2025 10:13

Copilot AI reviewed Nov 14, 2025

View reviewed changes

shenxiangzhuang merged commit 352f7b4 into master Nov 14, 2025
12 checks passed

shenxiangzhuang deleted the fix/attention_mask branch November 14, 2025 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(attention): fix wrong broadcast #52

fix(attention): fix wrong broadcast #52

Uh oh!

shenxiangzhuang commented Nov 14, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(attention): fix wrong broadcast #52

fix(attention): fix wrong broadcast #52

Uh oh!

Conversation

shenxiangzhuang commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shenxiangzhuang commented Nov 14, 2025 •

edited

Loading

codecov bot commented Nov 14, 2025 •

edited

Loading