Releases: TransformerLensOrg/TransformerLens
v4.0.0a1
What's Changed
- 3.0 CI Bugs by @jlarson4 in #1261
- fix: use cfg.dtype instead of torch.get_default_dtype for KV cache init by @davidcyze in #1260
- Fix tests broken by a local GPU by @brendanlong in #1219
- fix: handle LayerNorm folding correctly in load_and_process_state_dict by @VedantMadane in #1215
- Fix HookedTransformerConfig rotary_base types by @brendanlong in #1231
- Fixed Masking in HookedTransformer.generate by @tuomaso in #999
- Add hooked transformer generate stream by @anthonyduong9 in #908
- Add py.typed for type hints by @UFO-101 in #760
- Created Baichuan Architecture adapter by @jlarson4 in #1262
- Make
FactoredMatrixcompatible with tensor-like arguments by @JasonGross in #599 - NanoGPT Conversation did not handle case when there were no biases in model by @dashstander in #629
- [BUG] Batched Generation solution extended to run_with_cache and run_with_hooks for TransformerBridge by @jlarson4 in #1265
- Added 1D tensor handling to TransformerBridge by @jlarson4 in #1266
- Added n_ctx override to TransformerBridge by @jlarson4 in #1269
- Feature/generate stream on bridge by @jlarson4 in #1268
- Added warnings for users attempting to use MPS with Torch 2.8 by @jlarson4 in #1271
- Improved Tokenize & Concatenate by @jlarson4 in #1273
- Multi-Device Processing on Bridge by @jlarson4 in #1270
- Adding Architecture Adapter Creation Guide to Docs by @jlarson4 in #1274
- Fixed Quantization bug in TransformerLens 3.0 by @jlarson4 in #1276
- Fix generate() when tokenizer is unset and add regression tests by @DityaChawla in #1267
- Updated
boot_transformersto use local hf_config, if provided by @jlarson4 in #1279 - Prevent weight processing split between devices by @jlarson4 in #1281
- Fix: IOIDataset generates diverse samples by @DivijChawla in #1282
- Fix: preserve tokenizer.padding_side when reloading with add_bos_token by @DivijChawla in #1283
- MPS CI Support by @huseyincavusbi in #1278
- Add n_params_total property for total parameter count by @DivijChawla in #1284
- Model Table Cleanup by @jlarson4 in #1285
- Demos/bridge lm eval demo by @jlarson4 in #1286
- Tokenize and Concatenate additional datasets by @jlarson4 in #1287
- Resolution to Issues #477 and #264 by @jlarson4 in #1288
- mT5 Support by @jlarson4 in #1289
- SimpleStories Model verification by @jlarson4 in #1292
- Verification System Improvements by @jlarson4 in #1293
- fix: Bump CI actions and stabilize flaky notebook checks by @huseyincavusbi in #1290
- Fix bridge
generatewith no tokenizer by @jlarson4 in #1299 - Issue resolution for #341, #644, and #210 by @jlarson4 in #1300
- Resolution for #796, #453, #385, and #297 by @jlarson4 in #1301
- Improve Architecture Adapter Testing by @jlarson4 in #1303
- Resolution for #112 and #830 by @jlarson4 in #1304
- Add OPT architecture adapter tests by @willytop8 in #1305
- Add GPT2 and Gpt2LM Head architecture adapter tests by @sunny1401 in #1306
- Qwen3.5 text-only TransformerBridge support by @SamuelePunzo in #1313
- Add GPT-J architecture adapter tests by @along-l in #1314
- Feat/external architecture registration by @huseyincavusbi in #1307
- Transformers v5 Gemma scaling adjustment by @jlarson4 in #1315
- Adding adapter tests for Qwen2 by @Rishik00 in #1309
- HuggingFace Rate Limit Improvements by @jlarson4 in #1318
- Compatibility Mode – pre-ln split qkv hooks by @jlarson4 in #1319
- Pre-release cleanup by @jlarson4 in #1320
- Add Mixtral architecture adapter tests by @RecreationalMath in #1329
- Removed unused PythiaArchitectureAdapter from TransformerLens by @Rishik00 in #1332
- Add qwen3 and llava_next adapter tests by @sunny1401 in #1331
- Feature/telemetry demo notebook by @jonathanrbelanger-lang in #1308
- Add OLMoE architecture adapter tests by @RecreationalMath in #1333
- Test coverage/cleaning up xfails by @jlarson4 in #1334
- Initial Driver System by @jlarson4 in #1335
- vLLM Batches by @jlarson4 in #1338
- Add
return_cacheoption toTransformerBridge.generateby @RecreationalMath in #1337 - Fix FactoredMatrix indexing returning empty result for -1 index by @Kymi808 in #1340
- Add GPT-OSS architecture adapter tests by @RecreationalMath in #1341
- vLLM Driver Bugs by @jlarson4 in #1343
- Remove deprecated
move_modelparameter fromActivationCache.toby @RecreationalMath in #1344 - Fix
run_with_cache(device=...)permanently moving the model by @RecreationalMath in #1345 - Fix to_numpy() crash on bfloat16 tensors by upcasting to float32 by @robbiebusinessacc in #1346
- Fix sample_logits crash when top_k exceeds vocab size by @robbiebusinessacc in #1347
- Update broken Slack link again by @APatelUIUC in #1339
New Contributors
- @davidcyze made their first contribution in #1260
- @VedantMadane made their first contribution in #1215
- @tuomaso made their first contribution in #999
- @dashstander made their first contribution in #629
- @DityaChawla made their first contribution in #1267
- @DivijChawla made their first contribution in #1282
- @willytop8 made their first contribution in #1305
- @sunny1401 made their first contribution in #1306
- @along-l made their first contribution in #1314
- @Rishik00 made their first contribution in #1309
- @RecreationalMath made their first contribution in #1329
- @jonathanrbelanger-lang made their first contribution in #1308
- @Kymi808 made th...
v3.3.0
What's Changed
transformers version bumped to 5.4.0 This should not negatively effect existing users, the pyproject.toml was resolving to >5.4.0 already and the dependency update was required to properly support Qwen 3.5
- fix: Bump CI actions and stabilize flaky notebook checks by @huseyincavusbi in #1290
- Fix bridge
generatewith no tokenizer by @jlarson4 in #1299 - Issue resolution for #341, #644, and #210 by @jlarson4 in #1300
- Resolution for #796, #453, #385, and #297 by @jlarson4 in #1301
- Improve Architecture Adapter Testing by @jlarson4 in #1303
- Resolution for #112 and #830 by @jlarson4 in #1304
- Add OPT architecture adapter tests by @willytop8 in #1305
- Add GPT2 and Gpt2LM Head architecture adapter tests by @sunny1401 in #1306
- Qwen3.5 text-only TransformerBridge support by @SamuelePunzo in #1313
- Add GPT-J architecture adapter tests by @along-l in #1314
- Feat/external architecture registration by @huseyincavusbi in #1307
- Transformers v5 Gemma scaling adjustment by @jlarson4 in #1315
- Adding adapter tests for Qwen2 by @Rishik00 in #1309
- HuggingFace Rate Limit Improvements by @jlarson4 in #1318
- Compatibility Mode – pre-ln split qkv hooks by @jlarson4 in #1319
- Pre-release cleanup by @jlarson4 in #1320
- Fix TransformerBridge backward hook cleanup by @SamuelePunzo in #1324
- Add Mixtral architecture adapter tests by @RecreationalMath in #1329
- Removed unused PythiaArchitectureAdapter from TransformerLens by @Rishik00 in #1332
- Add qwen3 and llava_next adapter tests by @sunny1401 in #1331
- Feature/telemetry demo notebook by @jonathanrbelanger-lang in #1308
- Add OLMoE architecture adapter tests by @RecreationalMath in #1333
- Test coverage/cleaning up xfails by @jlarson4 in #1334
- Release 3.3.0 by @jlarson4 in #1321
New Contributors
- @willytop8 made their first contribution in #1305
- @sunny1401 made their first contribution in #1306
- @SamuelePunzo made their first contribution in #1313
- @along-l made their first contribution in #1314
- @Rishik00 made their first contribution in #1309
- @jonathanrbelanger-lang made their first contribution in #1308
Full Changelog: v3.2.1...v3.3.0
v3.2.1
v3.2.0
What's Changed
- Fix generate() when tokenizer is unset and add regression tests by @DityaChawla in #1267
- Updated
boot_transformersto use local hf_config, if provided by @jlarson4 in #1279 - Prevent weight processing split between devices by @jlarson4 in #1281
- Fix: IOIDataset generates diverse samples by @DivijChawla in #1282
- Fix: preserve tokenizer.padding_side when reloading with add_bos_token by @DivijChawla in #1283
- MPS CI Support by @huseyincavusbi in #1278
- Add n_params_total property for total parameter count by @DivijChawla in #1284
- Model Table Cleanup by @jlarson4 in #1285
- Demos/bridge lm eval demo by @jlarson4 in #1286
- Tokenize and Concatenate additional datasets by @jlarson4 in #1287
- Resolution to Issues #477 and #264 by @jlarson4 in #1288
- mT5 Support by @jlarson4 in #1289
- SimpleStories Model verification by @jlarson4 in #1292
- Verification System Improvements by @jlarson4 in #1293
- Release v3.2.0 by @jlarson4 in #1294
New Contributors
- @DityaChawla made their first contribution in #1267
- @DivijChawla made their first contribution in #1282
Full Changelog: v3.1.0...v3.2.0
v3.1.0
What's Changed
- 3.0 CI Bugs by @jlarson4 in #1261
- fix: use cfg.dtype instead of torch.get_default_dtype for KV cache init by @davidcyze in #1260
- Fix type of HookedTransformerConfig.device by @brendanlong in #1230
- Fix tests broken by a local GPU by @brendanlong in #1219
- fix: handle LayerNorm folding correctly in load_and_process_state_dict by @VedantMadane in #1215
- Fix HookedTransformerConfig rotary_base types by @brendanlong in #1231
- Fixed Masking in HookedTransformer.generate by @tuomaso in #999
- Add hooked transformer generate stream by @anthonyduong9 in #908
- Add py.typed for type hints by @UFO-101 in #760
- Created Baichuan Architecture adapter by @jlarson4 in #1262
- Make
FactoredMatrixcompatible with tensor-like arguments by @JasonGross in #599 - NanoGPT Conversation did not handle case when there were no biases in model by @dashstander in #629
- [BUG] Batched Generation solution extended to run_with_cache and run_with_hooks for TransformerBridge by @jlarson4 in #1265
- Added 1D tensor handling to TransformerBridge by @jlarson4 in #1266
- Added n_ctx override to TransformerBridge by @jlarson4 in #1269
- Feature/generate stream on bridge by @jlarson4 in #1268
- Added warnings for users attempting to use MPS with Torch 2.8 by @jlarson4 in #1271
- Improved Tokenize & Concatenate by @jlarson4 in #1273
- Multi-Device Processing on Bridge by @jlarson4 in #1270
- Adding Architecture Adapter Creation Guide to Docs by @jlarson4 in #1274
- Fixed Quantization bug in TransformerLens 3.0 by @jlarson4 in #1276
- TransformerLens 3.1.0 by @jlarson4 in #1277
New Contributors
- @davidcyze made their first contribution in #1260
- @VedantMadane made their first contribution in #1215
- @tuomaso made their first contribution in #999
- @dashstander made their first contribution in #629
Full Changelog: v3.0.0...v3.1.0
TransformerLens 3.0
What's Changed
Migrating to a new way to implement models via the TransformerBridge system. Increased model support from ~200 models to ~9,000 models
- Refactor the utilities file into utilities folder by @starship006 in #628
- Raise exception when BERT is loaded with HookedTransformer instead of… by @degenfabian in #795
- Circular dependency resolution by @bryce13950 in #803
- fixed corner param by @bryce13950 in #817
- bumped python min version by @bryce13950 in #802
- Updates torch to use the most recent version by @bryce13950 in #822
- updated python requirements by @bryce13950 in #821
- Recent releases by @bryce13950 in #841
- updated mypy limit by @bryce13950 in #880
- Activation utils cleanup by @bryce13950 in #879
- Restore consistency of hook_normalized between LayerNorm and RMSNorm by @degenfabian in #770
- Fix that padding_side always defaults to "right" when no value is explicitly passed by @degenfabian in #814
- Unified conversions by @bryce13950 in #881
- Flatten state dictionary for proper weight loading by @degenfabian in #860
- enabled actions on action pr by @bryce13950 in #882
- Add weight conversion for Phi model by @degenfabian in #863
- Add weight conversion for T5 models by @degenfabian in #859
- Visualize weight conversions by @degenfabian in #852
- Fixed test for ensuring weight conversions are provided by @bryce13950 in #883
- Drop python 3.9 by @bryce13950 in #885
- Conversion improved test coverage by @bryce13950 in #886
- Component test coverage by @bryce13950 in #890
- Bug new loading by @bryce13950 in #891
- Weight conversion llama by @bryce13950 in #892
- Refactor supported models module by @bryce13950 in #893
- Bug neox by @bryce13950 in #895
- Feature model adapter by @bryce13950 in #928
- added test for making sure formatting works well by @bryce13950 in #932
- Refactor final issues by @bryce13950 in #933
- restored tokenizer content by @bryce13950 in #935
- Refactor weight conversion by @bryce13950 in #931
- added python 3.13 to CI by @bryce13950 in #843
- upstream fixes from dev by @bryce13950 in #941
- Flexible component mapping by @bryce13950 in #938
- Move flatten dictionary to architecture_conversion by @degenfabian in #936
- made new transformer bridge extend nn module properly by @bryce13950 in #955
- brought in remaining hooked transformer functions by @bryce13950 in #954
- Setup tokenizer in boot function by @degenfabian in #959
- Bridged Robust Model Structure by @bryce13950 in #960
- Remove transformers dependency from bridge tokenization by @degenfabian in #963
- Dynamically add boot function to bridge by @degenfabian in #964
- Pre release version publishing by @bryce13950 in #973
- Setup deprecated hook aliases and got the majority of the main demo running properly by @bryce13950 in #976
- Linear test coverage by @bryce13950 in #977
- Create Bridge for every Gemma 3 module by @degenfabian in #966
- Add Bridges for every module in GPT2 by @degenfabian in #967
- Cache hook aliases & stop at layer by @bryce13950 in #978
- Create Bridges for every module in Bloom models by @degenfabian in #970
- Create Bridges for every module in Gemma 2 by @degenfabian in #971
- Create bridges for every module in Gemma 1 by @degenfabian in #972
- Create bridges for every module in Mistral by @degenfabian in #979
- Remove that output_attention flag defaults to true in boot function by @degenfabian in #982
- Create bridge for every module in GPT-J by @degenfabian in #974
- Create bridge for every module in Llama by @degenfabian in #975
- Unified aliases by @bryce13950 in #991
- fixed hook alias positions by @bryce13950 in #992
- Create bridge for every module in Mixtral by @degenfabian in #984
- removed numpy ceiling by @bryce13950 in #994
- Ensure hook and property backwards compatibility with HookedTransformer by @degenfabian in #990
- Create bridge for every module in neox by @degenfabian in #995
- Create bridges for every module in neo by @degenfabian in #987
- Weight conversion renaming by @bryce13950 in #996
- Attention shape normalization by @bryce13950 in #997
- Joint hook handling by @bryce13950 in #1001
- Add compatibility_mode feature by @degenfabian in #998
- Add support for GPT-OSS by @degenfabian in #1004
- Fix GPT-OSS initialization error by @degenfabian in #1007
- added setters and hook utils to bridge by @bryce13950 in #1009
- updated property access by @bryce13950 in #1026
- feat: Bridge.boot should allow using alias model names, but show a deprecation warning by @hijohnnylin in #1028
- Move QKV separation into bridge that wraps QKV matrix by @degenfabian in #1027
- removed unnecessary import by @bryce13950 in #1030
- Attn pattern shape by @bryce13950 in #1029
- added cache layer for hook collection by @bryce13950 in #1032
- Bridge unit test compatibility coverage by @bryce13950 in #1031
- updated loading in interactive neuroscope demo to use transformer bridge by @degenfabian in #1017
- map hook_pos_embed to rotary_emb, allow hook_aliases to be a list by @hijohnnylin in #1034
- created new base config class by @bryce13950 in #1042
- made sure to check for nested hooks by @bryce13950 in #1035
- Fix warning for aliases when compatibility mode is turned off by @degenfabian in #1041
- Feature kv cache by @bryce13950 in https...
v2.18.0
What's Changed
- Isolate demo dependencies and pin orjson for CVE-2025-67221 mitigation by @evcyen in #1173
- feat: Add LIT integration for interactive model analysis (#121) by @HetanshWaghela in #1163
- fix: set n_ctx=512 for TinyStories models by @puranikyashaswin in #1162
- Fix/tokenize and concatenate invalid token by @evcyen in #1179
- Remove spurious warning for tokenize_and_concatenate by @evcyen in #1177
- Add MMLU benchmark evaluation to evals by @CarlG0123 in #1183
- Fix/1076 logit lens layer norm by @evcyen in #1180
- Updating Interactive Neuroscope, CI to properly install demo by @jlarson4 in #1205
- Fix tokenize_and_concatenate splitting tokens across chunk boundaries by @brainsnog in #1201
- Fix deprecated IPython magic() calls in demo notebooks (issue #1036) by @brainsnog in #1203
- Expose n_ctx override in HookedTransformer.from_pretrained (issue #1006) by @brainsnog in #1204
- Added warning flags for usages of MPS by @jlarson4 in #1182
- Add GPT-OSS-20B model support by @CarlG0123 in #1195
- fixed the logit lens implementation inside ActivationCache.accumulated_resid to match the standard definition in literature and the expected and defined behavior as per the documentation in the docstring and in the docs by @hartigel in #1077
- Add Apertus model support with XIeLU activation by @sinievanderben in #1197
- Fix attention calculation on mps for torch 2.8.0 by @BrownianNotion in #1068
- HuBERT support rollout by @david-wei-01001 in #1111
- Pre-release testing by @jlarson4 in #1210
- Fix backward hooks Runtime Error by @evcyen in #1175
- Release v2.18.0 by @jlarson4 in #1211
New Contributors
- @evcyen made their first contribution in #1173
- @HetanshWaghela made their first contribution in #1163
- @puranikyashaswin made their first contribution in #1162
- @CarlG0123 made their first contribution in #1183
- @brainsnog made their first contribution in #1201
- @hartigel made their first contribution in #1077
- @sinievanderben made their first contribution in #1197
- @BrownianNotion made their first contribution in #1068
- @david-wei-01001 made their first contribution in #1111
Full Changelog: v2.17.0...v2.18.0
v3.0.0b3
What's Changed
- Support callable filters in TransformerBridge.add_hook() by @jlarson4 in #1186
- Update Patching Hook to avoid causing conflicts by @jlarson4 in #1187
- Prevent Stale Joint QKV values from being incorporated into weight folding after Layer Norm application by @jlarson4 in #1188
- Updated to remove hardcoded .cpu() processing by @jlarson4 in #1189
- Return true initial batch size information by @jlarson4 in #1190
- hook_result & Hook Aliases issues by @jlarson4 in #1191
- updated loading in exploratory analysis demo to use transformer bridge by @degenfabian in #1014
- updated loading in patchscopes generation demo to use transformer bridge by @degenfabian in #1021
- Additional Exploratory analysis Demo fixes by @jlarson4 in #1192
- update loading in bert demo to use transformer bridge by @degenfabian in #1015
- updating loading in qwen demo to use transformer bridge by @degenfabian in #1025
- updated loading in activation patching demo to use transformer bridge by @degenfabian in #1011
- updating loading in t5 demo to use transformer bridge by @degenfabian in #1022
- updated loading in attribution patching demo to use transformer bridge by @degenfabian in #1013
- v3.0.0b3 – Notebook Demo Update & Bug Fixes by @jlarson4 in #1196
- Verifying Additional Models by @jlarson4 in #1199
- Feature/multimodal architecture adapters by @jlarson4 in #1200
- Fix boolean 4D attention-mask handling in joint-QKV bridge attention reconstruction by @speediedan in #1198
- Feature/llava next and onevision variants by @jlarson4 in #1202
Full Changelog: v3.0.0b2...v3.0.0b3
v3.0.0b2
What's Changed
- Release 2.16 by @bryce13950 in #945
- Release 2.16.1 by @bryce13950 in #952
- Update README.md by @jmole in #957
- improve model properties table in docs by @mivanit in #769
- Release v2.16.2 by @bryce13950 in #958
- Add Gemma 3 and MedGemma model support by @huseyincavusbi in #1149
- Add timestamp for 2.0 announcement [docs] by @MattAlp in #983
- Add support for Qwen/Qwen3-0.6B-Base model by @mtaran in #1075
- Repairing tests that were failing due to recent contributions by @jlarson4 in #1157
- Fix 934 by @kapedalex in #1155
- Fix 1130 and 1102 by @kapedalex in #1154
- Fix key and value heads patching for models with different n_heads from n_key_value_heads by @nikolaystanishev in #981
- updating the compatibility notebook by @jlarson4 in #1158
- New Release – v2.17.0 by @jlarson4 in #1159
- Integrate v2.17.0 phase1 by @jlarson4 in #1166
- transformers v5 support by @jlarson4 in #1167
- Improve TransformerBridge optimizer compatibility via dual PyTorch/TransformerLens parameter access API by @speediedan in #1143
- Add HuggingFace ModelOutput support to TransformerLens generation API by @speediedan in #1144
- Testing R1 Distills to confirm functional in TransformerLens by @jlarson4 in #1168
- StableLM Architecture Adapter by @jlarson4 in #1171
- Complete type checking for OLMo support (builds on #816) by @taziksh in #1081
- Olmo3 support by @etomoscow in #1170
- Setup and tested OLMo architecture adapters by @jlarson4 in #1174
- Isolate demo dependencies and pin orjson for CVE-2025-67221 mitigation by @evcyen in #1173
- feat: Add LIT integration for interactive model analysis (#121) by @HetanshWaghela in #1163
- OpenELM Architecture Adapter by @jlarson4 in #1172
- fix: set n_ctx=512 for TinyStories models by @puranikyashaswin in #1162
- Architecture Benchmarks – Review & Extension by @jlarson4 in #1176
- created initial model registry tool by @bryce13950 in #1151
- Initial Verification Run by @jlarson4 in #1181
- Additional Verification by @jlarson4 in #1184
- Prepping for v3.0.0b2 by @jlarson4 in #1185
New Contributors
- @jmole made their first contribution in #957
- @huseyincavusbi made their first contribution in #1149
- @MattAlp made their first contribution in #983
- @mtaran made their first contribution in #1075
- @kapedalex made their first contribution in #1155
- @nikolaystanishev made their first contribution in #981
- @taziksh made their first contribution in #1081
- @etomoscow made their first contribution in #1170
- @HetanshWaghela made their first contribution in #1163
- @puranikyashaswin made their first contribution in #1162
Full Changelog: v3.0.0b1...v3.0.0b2
v2.17.0
We've got an exciting new release that includes several new models! Gemma 3, MedGemma, and Qwen3-0.6B-Base are now included in options for models. In addition to these new models, a handful of bugs and other small non-breaking changes were made.
What's Changed
- Update README.md by @jmole in #957
- improve model properties table in docs by @mivanit in #769
- Release v2.16.2 by @bryce13950 in #958
- Add Gemma 3 and MedGemma model support by @huseyincavusbi in #1149
- Add timestamp for 2.0 announcement [docs] by @MattAlp in #983
- Add support for Qwen/Qwen3-0.6B-Base model by @mtaran in #1075
- Repairing tests that were failing due to recent contributions by @jlarson4 in #1157
- Fix 934 by @kapedalex in #1155
- Fix 1130 and 1102 by @kapedalex in #1154
- Fix key and value heads patching for models with different n_heads from n_key_value_heads by @nikolaystanishev in #981
- updating the compatibility notebook by @jlarson4 in #1158
- New Release – v2.17.0 by @jlarson4 in #1159
New Contributors
- @jmole made their first contribution in #957
- @huseyincavusbi made their first contribution in #1149
- @MattAlp made their first contribution in #983
- @mtaran made their first contribution in #1075
- @kapedalex made their first contribution in #1155
- @nikolaystanishev made their first contribution in #981
Full Changelog: v2.16.1...v2.17.0