Vision-Language-Guided Concept Bottleneck Model (VLG-CBM)

This is the official repository for our paper VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance, NeurIPS 2024. [Project Website] [Paper]

VLG-CBM provides a novel method to train Concept Bottleneck Models(CBMs) with guidance from both vision and language domain.
VLG-CBM provides concise and accurate concept attribution for the decision made by the model. The following figure compares decision explanation of VLG-CBM with existing methods by listing top-five contributions for their decisions.

[Update July 2025] We release our new tool for ANEC evaluation! If you want to measure ANEC on your own model, please check out This repository. Simply save your model's output and run one line of command to get ANEC result!

Setup

Setup conda environment and install dependencies

  conda create -n vlg-cbm python=3.12
  conda activate vlg-cbm
  pip install -r requirements.txt

(optional) Install Grounding DINO for generating annotations on custom datasets

git clone https://github.com/IDEA-Research/GroundingDINO
cd GroundingDINO
pip install -e .
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
cd ..

Quick Start

We provide scripts to download and evaluate pretrained models for CIFAR10, CIFAR100, CUB200, Places365, and ImageNet. To quickly evaluate the pretrained models, follow the steps below:

Download pretrained models from here, unzip them, and place them in the saved_models folder.
Run evaluation script to evaluate the pretrained models under different NEC and obtain Accuracy at different NEC (ANEC) for each dataset.

python sparse_evaluation.py --load_path <path-to-model-dir>

For example, to evaluate the pretrained model for CUB200, run

python sparse_evaluation.py --load_path saved_models/cub

Training

Overview

Annotation Generation (Optional)

To train VLG-CBM, images must be annotated with concepts from a Vision-Language model, and this work uses Grounding-DINO for annotation generation. Use the following command to generate annotations for a dataset:

python -m scripts.generate_annotations --dataset <dataset-name> --device cuda --batch_size 32 --text_threshold 0.15 --output_dir annotations

Note: Supported datasets include cifar10, cifar100, cub, places365, and imagenet. The generated annotations will be saved under annotations folder.

Training Pipeline

Download annotated data from here, unzip them, and place it in the annotations folder or generate it using Grounding DINO as described in the previous section.
All datasets must be placed in a single folder specified by the environment variable $DATASET_FOLDER. By default, $DATASET_FOLDER is set to datasets.

Note: To download and process CUB dataset, please run bash download_cub.sh and move the folder under $DATASET_FOLDER. To use ImageNet dataset, you need to download the ImageNet dataset yourself and put it under $DATASET_FOLDER. The other datasets could be downloaded automatically by Torchvision.

Train a concept bottleneck model using the config files in ./configs. For instance, to train a CUB model, run the following command:

  python train_cbm.py --config configs/cub.json --annotation_dir annotations

Evaluate trained models

Number of Effective Concepts (NEC) needs to be controlled to enable a fair comparison of model performance. To evaluate a trained model under different NEC, run the following command:

python sparse_evaluation.py --load_path <path-to-model-dir> --lam <lambda-value>

Results

Accuracy at NEC=5 (ANEC-5) for non-CLIP backbone models

Dataset	CIFAR10	CIFAR100	CUB200	Places365	ImageNet
Random	67.55%	29.52%	68.91%	17.57%	41.49%
LF-CBM	84.05%	56.52%	53.51%	37.65%	60.30%
LM4CV	53.72%	14.64%	N/A	N/A	N/A
LaBo	78.69%	44.82%	N/A	N/A	N/A
VLG-CBM(Ours)	88.55%	65.73%	75.79%	41.92%	73.15%

Accuracy at NEC=5 (ANEC-5) for CLIP backbone models

Dataset	CIFAR10	CIFAR100	ImageNet	CUB
Random	67.55%	29.52%	18.04%	25.37%
LF-CBM	84.05%	56.52%	52.88%	31.35%
LM4CV	53.72%	14.64%	3.77%	3.63%
LaBo	78.69%	44.82%	24.27%	41.97%
VLG-CBM (Ours)	88.55%	65.73%	59.74%	60.38%

Explainable Decisions

Sources

CUB dataset: https://www.vision.caltech.edu/datasets/cub_200_2011/
Sparse final layer training: https://github.com/MadryLab/glm_saga
Explanation bar plots adapted from: https://github.com/slundberg/shap
CLIP: https://github.com/openai/CLIP
Label-free CBM: https://github.com/Trustworthy-ML-Lab/Label-free-CBM
Grounding DINO: https://github.com/IDEA-Research/GroundingDINO

Cite this work

If you find this work useful, please consider citing:

@inproceedings{srivastava2024vlg,
        title={VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance},
        author={Srivastava, Divyansh and Yan, Ge and Weng, Tsui-Wei},
        journal={NeurIPS},
        year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
clip		clip
concept_files		concept_files
configs		configs
data		data
datasets		datasets
evaluations		evaluations
glm_saga		glm_saga
interpretability		interpretability
model		model
notebooks		notebooks
scripts		scripts
visualization		visualization
README.md		README.md
loss.py		loss.py
requirements.txt		requirements.txt
sparse_evaluation.py		sparse_evaluation.py
test_cbm.py		test_cbm.py
train_cbm.py		train_cbm.py
train_standard.py		train_standard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision-Language-Guided Concept Bottleneck Model (VLG-CBM)

Table of Contents

Setup

Quick Start

Training

Overview

Annotation Generation (Optional)

Training Pipeline

Evaluate trained models

Results

Sources

Cite this work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Trustworthy-ML-Lab/VLG-CBM

Folders and files

Latest commit

History

Repository files navigation

Vision-Language-Guided Concept Bottleneck Model (VLG-CBM)

Table of Contents

Setup

Quick Start

Training

Overview

Annotation Generation (Optional)

Training Pipeline

Evaluate trained models

Results

Sources

Cite this work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages