Skip to content

fix the self_attn module of DecoderLayer#22

Open
TrinitialChan wants to merge 1 commit intoSHI-Labs:masterfrom
TrinitialChan:fix_self_attn
Open

fix the self_attn module of DecoderLayer#22
TrinitialChan wants to merge 1 commit intoSHI-Labs:masterfrom
TrinitialChan:fix_self_attn

Conversation

@TrinitialChan
Copy link

Your paper specifies that the Decoder section performs a stacked multi-head self-attention operation, however I have found in the code that the behavior of the DecoderLayer class is inconsistent with the above description. By printing the attn_output_weights of the self_attn module, I found attention map shaped '([L, 1, 1])', and there is clearly a problem with such an attention computation. I provided a quickfix in this PR.

@kilimchoi
Copy link

@xingqian2018 can you check this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants