Skip to content

fix: gMLP uses full bias instead of truncated bias#1371

Open
Mr-Neutr0n wants to merge 2 commits intoEleutherAI:mainfrom
Mr-Neutr0n:fix/gmlp-bias-dimension-mismatch
Open

fix: gMLP uses full bias instead of truncated bias#1371
Mr-Neutr0n wants to merge 2 commits intoEleutherAI:mainfrom
Mr-Neutr0n:fix/gmlp-bias-dimension-mismatch

Conversation

@Mr-Neutr0n
Copy link

Bug

The gMLP SpatialGatingUnit.forward method correctly slices the projection bias to match the current sequence length (bias[:n]) for the causal case, but then passes self.proj.bias (the full, unsliced bias) to F.linear instead of the local bias variable. This causes a dimension mismatch when the sequence length is shorter than neox_args.seq_length.

Fix

Changed F.linear(gate.transpose(2, 1), weight, self.proj.bias) to F.linear(gate.transpose(2, 1), weight, bias) so the correctly sliced bias is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant