Skip to content

Conversation

@adamkarvonen
Copy link
Collaborator

@adamkarvonen adamkarvonen commented May 13, 2025

I made a few backward compatible changes here:

  • First, I was training some Qwen 32B SAEs. I added a truncate_model() function, which deletes unneeded layers, which provides very significant memory savings on large models.
  • I added an optional Pytorch activation buffer, which was easier to use with truncate_model(). I also added an Activault streaming activation buffer, which streams activations from S3.
  • I added a data mixture dataset generator, which enables training on a mixture of 2 datasets. This is very useful for e.g. training on a mixture of pretrain and chat data.
  • I also added an optional backup_steps, which saves the SAEs (state dict and optimizer) every x steps. This is useful when working with larger models, where training runs can be >24 hours.

I checked that the end to end test passed before and after.

@adamkarvonen adamkarvonen merged commit d639166 into main May 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants