Navigate to the path where you would like to clone this repository (will refer to REPO_DIR) and run:
git clone https://github.com/HPAI-BSC/HF-SAE.git
cd HF-SAERun the following to get set up using uv:
# Install Python 3.10 and create env
uv python install 3.10
uv venv --python 3.10
# Sync dependencies
uv sync
# Clone and add lang-segment-anything (required for baseline_simulator.py)
git clone https://github.com/luca-medeiros/lang-segment-anything.git
uv add ./lang-segment-anythingFor downloading models/datasets from Hugging Face set the environment variables to allow internet access and identify yourself:
huggingface-cli loginSet the environment variables in .env file.
REPO_DIR=... # Your repository directory
DATA_DIR=... # Your data directory, where artifacts will be saved
HF_HOME=... # Optional, custom Hugging Face cache directory, default is `~/.cache/huggingface`The config files are located in config directory. If you want to add a new dataset or new model, you need to add it to dataset_config.yaml and llm_config.yaml files:
llm_config.yaml contains the configuration for the LLMs.
dataset_config.yaml contains the configuration for the datasets.
saes.yaml contains the paths to the SAEs and top-k images and outputs for each model and layer.
Pretrained SAEs are available on Hugging Face and are downloaded automatically by the code when needed:
- Gemma-3-4B-IT: javifer/google_gemma-3-4b-it-saes
- InternVL3-14B: javifer/OpenGVLab_InternVL3-14B-saes
For training SAEs, runsrc/demo.py, which launches training and evaluation of an SAE(s). Some training parameters are passed as arguments to src/demo.py, and the rest are set in the config file src/demo_config.py:
python src/demo.py \
--model_name google/gemma-3-4b-it \
--layers 16 \
--architectures top_k \
--dataset ILSVRC/imagenet-1k \
--test_set ILSVRC/imagenet-1k \
--ratio_of_training_data 0.5 \
--submodel encTo compute top activating visualizations, we use src/get_max_activating_vision.py script.
python src/get_max_activating_vision.py \
--top_k 10 \
--ids_selection top_k \
--n_images 128 \
--sae_path [PATH_TO_SAE]To visualize the top activating visualizations, use notebooks/max_activating_viz_vision_read.ipynb notebook.
Three evaluators are available in the evaluation/ directory to assess SAE feature explanations. See evaluation/README.md for full usage details.
- Baseline Simulator (
evaluation/baseline_simulator.py): Compares SAE activation heatmaps with LangSAM text-guided segmentation masks (IoU, precision, recall). - CLIP Simulator (
evaluation/clip_simulator.py): Measures CLIP similarity between top-k images and their concept explanations. - Image Generation Evaluator (
evaluation/image_gen_eval.py): Checks whether images generated from concept explanations activate the expected SAE features more than random images (AUROC).
Based on the model type, the input is processed and tokenized differently, this is handled by tokenized_batch function in src/processing.py.
If we are working with a vision-language model and use both text and images, the input format is:
input = [
{'image': [image]},
{'text': ['Text...(e.g. Describe this image in detail.)']}
]
data_batch = tokenized_batch(input, processor.tokenizer, processor, cfg)If we are working with a vision model, the input format is:
input = [{'image': [image]}]
data_batch = tokenized_batch(input, processor.tokenizer, processor, cfg)If we are working with a text model, the input format is:
input = [{'text': ['Text...']}]
data_batch = tokenized_batch(input, processor.tokenizer, processor, cfg)