Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR is WIP with the goal of simplifying and separating out SAE architecture configs. Each SAE arch now gets its own config, which can be further customized. I've also removed some config options that are legacy / not well used or documented. These deletions includes:
I've tried to move only stuff into the base SAEConfig if it's actually needed to run the the SAE, e.g. size of the SAE, rather than stuff that's useful to know but not actually needed (e.g. what model / layer / L1 coefficient, etc...). This extra info is moved to a metadata option on the config.
This PR also refactors the way various coefficients work, so each training SAE class must implement
get_coefficients()
that returns a dict of coefficient names and values / warm-up step. This solves the problem that L1 SAEs have a L1 coefficient, but JumpReLU has a L0 coefficient, and topk have neither (but may have an aux coefficient in the future).These changes should also make it easy to add new architecture or tweak exsiting architectures. You just need call
register_sae_training_class()
andregister_sae_class()
with your custom SAE class / config, and then you can train with it.Still TODO