Skip to content

Conversation

@shenxiangzhuang
Copy link
Collaborator

@shenxiangzhuang shenxiangzhuang commented Nov 14, 2025

Fixed #50

See hyunwoongko/transformer#40 for details.

@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.42%. Comparing base (ce5e125) to head (9232eaf).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #52   +/-   ##
=======================================
  Coverage   94.42%   94.42%           
=======================================
  Files           9        9           
  Lines         520      520           
=======================================
  Hits          491      491           
  Misses         29       29           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes incorrect masking behavior in attention mechanisms by addressing two issues: replacing the arbitrary large negative value (-10000) with proper negative infinity (-inf) for masked positions, and correcting the broadcast dimensions of the target padding mask from (batch_size, 1, target_seq_length, 1) to (batch_size, 1, 1, target_seq_length) to properly combine with the causal mask.

  • Changed mask fill value from float("-10000") to float("-inf") in both transformer and BERT attention implementations
  • Fixed target mask shape generation to use correct unsqueeze dimensions for proper broadcasting
  • Updated comment to reflect corrected mask shape

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
toynlp/transformer/model.py Fixed attention mask fill value and corrected target mask broadcast dimensions with updated comment
toynlp/bert/model.py Fixed attention mask fill value and removed TODO comment

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@shenxiangzhuang shenxiangzhuang merged commit 352f7b4 into master Nov 14, 2025
12 checks passed
@shenxiangzhuang shenxiangzhuang deleted the fix/attention_mask branch November 14, 2025 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attention Mask fill -inf or -10000

2 participants