Skip to content

add conv1d layer type (used in gpt2)#50

Merged
zywilliamli merged 7 commits intoEleutherAI:mainfrom
zywilliamli:main
Oct 16, 2025
Merged

add conv1d layer type (used in gpt2)#50
zywilliamli merged 7 commits intoEleutherAI:mainfrom
zywilliamli:main

Conversation

@zywilliamli
Copy link
Collaborator

gpt2 type models use transformers.pytorch_utils.Conv1D instead of nn.Linear which is not currently tracked by gradient collector, this pr adds that layer type.

tested manually using python -m bergson runs/test --model openai-community/gpt2 --dataset NeelNanda/pile-10k --truncation, also added unit tests for it.

"""Process the incoming gradient wrt the output of the module."""
# Sanity checks
assert isinstance(module, nn.Linear), "Expected a Linear module"
assert isinstance(module, LayerAdapter.supported_modules), "Expected a supported module"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we print the supported modules here?

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@luciaquirke luciaquirke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@zywilliamli zywilliamli merged commit 0e1b245 into EleutherAI:main Oct 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants