Skip to content

HPAI-BSC/vision-interp

Repository files navigation

Language Models Can Explain Visual Features via Steering [CVPR 2026]

Getting Started

Navigate to the path where you would like to clone this repository (will refer to REPO_DIR) and run:

git clone https://github.com/HPAI-BSC/HF-SAE.git
cd HF-SAE

Run the following to get set up using uv:

# Install Python 3.10 and create env
uv python install 3.10
uv venv --python 3.10

# Sync dependencies
uv sync

# Clone and add lang-segment-anything (required for baseline_simulator.py)
git clone https://github.com/luca-medeiros/lang-segment-anything.git
uv add ./lang-segment-anything

For downloading models/datasets from Hugging Face set the environment variables to allow internet access and identify yourself:

huggingface-cli login

Set the environment variables in .env file.

REPO_DIR=... # Your repository directory
DATA_DIR=... # Your data directory, where artifacts will be saved
HF_HOME=... # Optional, custom Hugging Face cache directory, default is `~/.cache/huggingface`

Config files

The config files are located in config directory. If you want to add a new dataset or new model, you need to add it to dataset_config.yaml and llm_config.yaml files:

llm_config.yaml contains the configuration for the LLMs. dataset_config.yaml contains the configuration for the datasets. saes.yaml contains the paths to the SAEs and top-k images and outputs for each model and layer.

Pretrained SAEs

Pretrained SAEs are available on Hugging Face and are downloaded automatically by the code when needed:

SAE Training

For training SAEs, runsrc/demo.py, which launches training and evaluation of an SAE(s). Some training parameters are passed as arguments to src/demo.py, and the rest are set in the config file src/demo_config.py:

python src/demo.py \
  --model_name google/gemma-3-4b-it \
  --layers 16 \
  --architectures top_k \
  --dataset ILSVRC/imagenet-1k \
  --test_set ILSVRC/imagenet-1k \
  --ratio_of_training_data 0.5 \
  --submodel enc

Top Activating Visualizations

To compute top activating visualizations, we use src/get_max_activating_vision.py script.

python src/get_max_activating_vision.py \
  --top_k 10 \
  --ids_selection top_k \
  --n_images 128 \
  --sae_path [PATH_TO_SAE]

To visualize the top activating visualizations, use notebooks/max_activating_viz_vision_read.ipynb notebook.

Evaluation

Three evaluators are available in the evaluation/ directory to assess SAE feature explanations. See evaluation/README.md for full usage details.

  • Baseline Simulator (evaluation/baseline_simulator.py): Compares SAE activation heatmaps with LangSAM text-guided segmentation masks (IoU, precision, recall).
  • CLIP Simulator (evaluation/clip_simulator.py): Measures CLIP similarity between top-k images and their concept explanations.
  • Image Generation Evaluator (evaluation/image_gen_eval.py): Checks whether images generated from concept explanations activate the expected SAE features more than random images (AUROC).

Data Format

Based on the model type, the input is processed and tokenized differently, this is handled by tokenized_batch function in src/processing.py.

If we are working with a vision-language model and use both text and images, the input format is:

input = [
    {'image': [image]},
    {'text': ['Text...(e.g. Describe this image in detail.)']}
]
data_batch = tokenized_batch(input, processor.tokenizer, processor, cfg)

If we are working with a vision model, the input format is:

input = [{'image': [image]}]
data_batch = tokenized_batch(input, processor.tokenizer, processor, cfg)

If we are working with a text model, the input format is:

input = [{'text': ['Text...']}]
data_batch = tokenized_batch(input, processor.tokenizer, processor, cfg)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors