LLM-jp SAE

(Work in progress)

This repository provides code for training and evaluating Sparse Autoencoders (SAEs) on the internal representations of LLM-jp. We trained SAEs separately on six different checkpoints of LLM-jp-3-1.8B and compared learned features across checkpoints.

Demo Page: Visualize the text samples that activate each SAE feature (100 features per checkpoint).
Model Weights: SAE weights for all six checkpoints.
Paper: "How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders"

Usage

Environment

Python 3.10.12

uv init
uv sync

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Need 2GPUs for training. You should prepare the raw data of en_wiki and ja_wiki in llm-jp-corpus-v3 and checkpoints of LLM-jp-3-1.8B in advance. Before running the code, please fix the UsrConfig in config.py to match your environment.

Train, Visualize, and Evaluate SAE

Prepare Data

download llmjp-corpus-v3

python prepare_data.py

Train SAE

python train.py

Collect Examples for each Feature

python collect_examples.py

Evaluate activation patterns

python evaluate.py

Use Trained SAE and Visualize and Evaluate it

Prepare Data

download llmjp-corpus-v3

python prepare_data.py

Citations

@inproceedings{inaba-etal-2025-bilingual,
    title = "How a Bilingual {LM} Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders",
    author = "Inaba, Tatsuro  and
      Kamoda, Go  and
      Inui, Kentaro  and
      Isonuma, Masaru  and
      Miyao, Yusuke  and
      Oseki, Yohei  and
      Takagi, Yu  and
      Heinzerling, Benjamin",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    year = "2025",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.725/",
    pages = "13458--13470",
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
utils		utils
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.py		config.py
dataset.py		dataset.py
evaluation.py		evaluation.py
model.py		model.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-jp SAE

Usage

Environment

Train, Visualize, and Evaluate SAE

Prepare Data

Train SAE

Collect Examples for each Feature

Evaluate activation patterns

Use Trained SAE and Visualize and Evaluate it

Prepare Data

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

llm-jp/llm-jp-sae

Folders and files

Latest commit

History

Repository files navigation

LLM-jp SAE

Usage

Environment

Train, Visualize, and Evaluate SAE

Prepare Data

Train SAE

Collect Examples for each Feature

Evaluate activation patterns

Use Trained SAE and Visualize and Evaluate it

Prepare Data

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages