Name	Name	Last commit message	Last commit date
parent directory ..
recipes	recipes
sae	sae
.ci_build.sh	.ci_build.sh
.gitignore	.gitignore
README.md	README.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

Sparse Autoencoders for Model Interpretability

The Sparse Autoencoder(SAE) package provides a domain-agnostic implementation of SAE designed for mechanistic model interpretability. This tool provides a generic SAE library with various SAE architectures, as well as comprehensive training and analysis tools. Model-specific recipes are included to guide researchers on applying SAEs to any chosen model.

Check out CodonFM's extracted features on this interactive dashboard built with this SAE library. CodonFM recipe coming soon!

Early Release -- This project is under active development. APIs and interfaces may change between versions. Feedback and contributions are welcome.

Structure

sparse_autoencoders/
├── sae/                  # Core SAE library (domain-agnostic)
│   └── src/sae/
│       ├── architectures/ # ReLU-L1, Top-K, MoE SAE implementations
│       ├── training.py    # Trainer with checkpointing and logging
│       ├── eval/          # Reconstruction, sparsity, dead latent metrics
│       ├── autointerp/    # LLM-based automated feature interpretation
│       ├── collector.py   # Activation collection utilities
│       └── dashboard/     # Interactive feature visualization (React)
├── recipes/
│   ├── esm2/             # ESM2 protein language model recipe
│   │   ├── scripts/      # extract, train, eval, dashboard launchers
│   │   ├── configs/      # Hydra configs for 650M, 3B, 15B models
│   │   └── src/esm2_sae/ # ESM2-specific data, eval, and viz
└── pyproject.toml        # UV workspace configuration

Quick Start

# Install everything (recommended)
uv sync

# Or install packages individually
pip install -e sae/
pip install -e recipes/esm2/

Train an SAE on ESM2

cd recipes/esm2

# Full pipeline (extract activations -> train -> evaluate)
python run.py model=3b

# Quick smoke test
python run.py model=650m num_proteins=100 train.n_epochs=1 nproc=1 dp_size=1

Use the core library standalone

from sae.architectures import TopKSAE
from sae.training import Trainer, TrainingConfig

sae = TopKSAE(input_dim=512, hidden_dim=4096, top_k=64)
trainer = Trainer(sae, TrainingConfig(lr=3e-4, n_epochs=10, batch_size=4096))
trainer.train(embeddings)

SAE Architectures

Architecture	Class	Sparsity Mechanism
ReLU + L1	`ReLUSAE`	L1 penalty on activations
Top-K	`TopKSAE`	Only top-K features activate per input
Mixture of Experts	`MoESAE`	Expert routing with sparse gating

Feature Exploration

Each recipe includes an interactive exploration UI built on Mosaic and DuckDB, with an Embedding Atlas view for navigating learned features. The dashboard supports browsing per-feature activation patterns, filtering by annotation or sequence, and exploring UMAP projections of the feature space -- all backed by fast, in-browser SQL queries over Parquet files.

Applicability Beyond Biology

While the included recipes target biological foundation models such as ESM2, the core sae library is entirely domain-agnostic. Sparse autoencoders are a general-purpose interpretability technique and work equally well on natural language and vision transformer activations. To apply SAEs to a new domain, create a recipe that handles embedding extraction and domain-specific evaluation while reusing the shared training and architecture code.

Roadmap

Parallelism beyond DDP -- tensor parallelism and fused sparse Top-K kernels for training SAEs on large-scale activations
Additional architectures -- gated SAEs, JumpReLU, and other emerging variants
More recipes -- Geneformer, AmplifyProt, ESM-C, and non-biological reference recipes

Adding a New Recipe

Create recipes/mymodel/ with src/, scripts/, configs/ dirs
Add a pyproject.toml depending on sae>=0.1.0
Register in the root pyproject.toml workspace members
See recipes/esm2/ as a reference implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Sparse Autoencoders for Model Interpretability

Structure

Quick Start

Train an SAE on ESM2

Use the core library standalone

SAE Architectures

Feature Exploration

Applicability Beyond Biology

Roadmap

Adding a New Recipe

FilesExpand file tree

sparse_autoencoders

Directory actions

More options

Directory actions

More options

Latest commit

History

sparse_autoencoders

Folders and files

parent directory

README.md

Sparse Autoencoders for Model Interpretability

Structure

Quick Start

Train an SAE on ESM2

Use the core library standalone

SAE Architectures

Feature Exploration

Applicability Beyond Biology

Roadmap

Adding a New Recipe