Skip to content

fix: gMLP uses full bias instead of truncated bias#1371

Merged
Quentin-Anthony merged 2 commits into
EleutherAI:mainfrom
Mr-Neutr0n:fix/gmlp-bias-dimension-mismatch
May 19, 2026
Merged

fix: gMLP uses full bias instead of truncated bias#1371
Quentin-Anthony merged 2 commits into
EleutherAI:mainfrom
Mr-Neutr0n:fix/gmlp-bias-dimension-mismatch

Conversation

@Mr-Neutr0n

Copy link
Copy Markdown
Contributor

Bug

The gMLP SpatialGatingUnit.forward method correctly slices the projection bias to match the current sequence length (bias[:n]) for the causal case, but then passes self.proj.bias (the full, unsliced bias) to F.linear instead of the local bias variable. This causes a dimension mismatch when the sequence length is shorter than neox_args.seq_length.

Fix

Changed F.linear(gate.transpose(2, 1), weight, self.proj.bias) to F.linear(gate.transpose(2, 1), weight, bias) so the correctly sliced bias is used.

@Quentin-Anthony Quentin-Anthony merged commit ea7aefd into EleutherAI:main May 19, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants