fix: gMLP uses full bias instead of truncated bias by Mr-Neutr0n · Pull Request #1371 · EleutherAI/gpt-neox

Mr-Neutr0n · 2026-02-11T18:21:05Z

Bug

The gMLP SpatialGatingUnit.forward method correctly slices the projection bias to match the current sequence length (bias[:n]) for the causal case, but then passes self.proj.bias (the full, unsliced bias) to F.linear instead of the local bias variable. This causes a dimension mismatch when the sequence length is shorter than neox_args.seq_length.

Fix

Changed F.linear(gate.transpose(2, 1), weight, self.proj.bias) to F.linear(gate.transpose(2, 1), weight, bias) so the correctly sliced bias is used.

Mr-Neutr0n added 2 commits February 11, 2026 23:50

fix: correct undefined self.args in TopKTokenChoiceRouter

a6a1df3

fix: use correctly sliced bias in gMLP projection

a8aa9ca

Mr-Neutr0n requested a review from Quentin-Anthony as a code owner February 11, 2026 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gMLP uses full bias instead of truncated bias#1371

fix: gMLP uses full bias instead of truncated bias#1371
Mr-Neutr0n wants to merge 2 commits intoEleutherAI:mainfrom
Mr-Neutr0n:fix/gmlp-bias-dimension-mismatch

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 11, 2026

Bug

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant