Shantanu Ghosh1, Rayan Syed1, Chenyu Wang1, Vaibhav Choudhary1, Binxu Li2, Clare B. Poynton3, Shyam Visweswaran4 Kayhan Batmanghelich1
1BU ECE, 2 Stanford University, 3 BUMC, 4 Pitt DBMI
- TL;DR
- Highlights
- Warnings
- Acknowledgements
- Environment Setup
- Dataset Zoo
- Model Zoo
- Downloading Classifier Checkpoints
- Vision-Language Representation Space
- Generating Captions
- LADDER Pipeline
- Demo Notebooks
- Scripts
- Citation
- License
- Contact
- Contributing
LADDER is a modular framework that uses large language models (LLMs) to discover, explain, and mitigate hidden biases in vision classifiersβwithout requiring prior knowledge of the biases or attribute labels.
-
π 6 Datasets Evaluated
- π¦ Natural Images: Waterbirds, CelebA, MetaShift
- π₯ Medical Imaging: NIH ChestX-ray, RSNA-Mammo, VinDr-Mammo
-
π§ͺ ~20 Bias Mitigation Algorithms Benchmarked
- π‘ ERM, GroupDRO, CVaR-DRO, JTT, LfF, DFR
- 𧬠CORAL, IRM, V-REx, IB-IRM, Reweighting, Mixup, AugMix
-
π§ 11 Architectures Across 5 Pretraining Strategies
- π§± CNNs: ResNet-50, EfficientNet-B5
- π² ViTs: ViT-B/16, ViT-S/16
- π§ͺ Pretrained With: SimCLR, Barlow Twins, DINO, CLIP (OpenAI),
IN1K, IN21K, SWAG, LAION-2B
-
π¬ 4 LLMs for Hypothesis Generation
- π§ GPT-4o, Gemini, LLaMA, Claude
Star π us if you think it is helpful!!
π§ Replace all hardcoded paths like
/restricted/projectnb/batmanlab/shawn24/PhD
with your own directory.Following guidelines of [MIMIC-CXR] (https://physionet.org/news/post/gpt-responsible-use), we setup google vertex ai for setting GEMINI as LLM for hypothesis generation for medical images in this codebase.
π All LLMs were evaluated using checkpoints available before Jan 11, 2025.
Newer versions may produce different hypotheses than those reported in the paper.π§ Default setup uses:
- GPT-4o as captioner for the natural images
- ResNet-50 for the classifier
- ViT-B/32 for the vision-language representation space
- GPT-4o for hypothesis generation
The code is modular and can be easily extended to other models and LLMs.ποΈ Update cache directory locations in
save_img_reps.py
to your own:os.environ['TRANSFORMERS_CACHE'] = '/your/custom/.cache/huggingface/transformers' os.environ['TORCH_HOME'] = '/your/custom/.cache/torch'
We rely heavily on the Subpopulation Shift Benchmark (SubpopBench) codebase for:
- π₯ Downloading and processing datasets
- π§ Classifier training on natural image benchmarks
- Note: SubpopBench does not support NIH-CXR datasets.
To address this, our codebase includes extended experiments for NIH ChestX-ray (NIH-CXR), which are discussed in subsequent sections.
Use environment.yaml
git clone [email protected]:batmanlab/Ladder.git
cd Ladder
conda env create --name Ladder -f environment.yml
conda activate Ladder
Please refer to the dataset_zoo.md for the details of the datasets used in this project. For toy dataset in Fig. 1, run the python script.
For the details of the classifiers, pretraining methods and algorithms supported by this codebase, refer to the classifier_zoo.md.
We provide the pretrained ResNet-50 (resnet_sup_in1k
) and EfficientNet-B5 (tf_efficientnet_b5_ns-detect
)
classifier checkpoints used in our experiments
via Hugging Face Hub.
- π¦ Waterbirds (ResNet-50)
- π€ CelebA (ResNet-50)
- πΆ MetaShift (ResNet-50)
- π« NIH ChestX-ray (ResNet-50)
- π§ͺ RSNA-Mammo (EfficientNet-B5)
- π₯ VinDr-Mammo (EfficientNet-B5)
We use the following vision-language representation space for our experiments:
- Natual images: CLIP
- Mammograms: Mammo-CLIP
- Chest-X-Rays: CXR-CLIP
Download the latest checkpoints from the respective repositories.
Ladder requires captions for the images in the validation dataset. We provide a script to generate captions for the images using BLIP and GPT-4o. You can get the captions directly from the respective dataset directory in Hugging Face or generate them using the following scripts.
python ./src/codebase/caption_images.py \
--seed=0 \
--dataset="Waterbirds" \
--img-path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/waterbird_complete95_forest2water2" \
--csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/metadata_waterbirds.csv" \
--save_csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/va_metadata_waterbirds_captioning_blip.csv" \
--split="va" \
--captioner="blip"
python ./src/codebase/caption_images_gpt_4.py \
--seed=0 \
--dataset="Waterbirds" \
--img-path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/waterbird_complete95_forest2water2" \
--csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/metadata_waterbirds.csv" \
--save_csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/va_metadata_waterbirds_captioning_GPT.csv" \
--split="va" \
--model="gpt-4o" \
--api_key="<open-ai key>"
- For NIH-CXR, we use the radiology report from MIMIC-CXR dataset. Download the metadata csv containing impression and findings from here.
- For RSNA-Mammo and VinDr-Mammo, we use the radiology text from Mammo-FActOR codebase.
Ladder pipeline consists of 6 steps. We uploaded the outputs of every step in the huggingface. The steps are as follows:
Step1: Save image representations of the image classifier and vision encoder from vision language representation space
python ./src/codebase/save_img_reps.py \
--seed=0 \
--dataset="Waterbirds" \
--classifier="resnet_sup_in1k" \
--classifier_check_pt="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/model.pkl" \
--flattening-type="adaptive" \
--clip_vision_encoder="ViT-B/32" \
--data_dir="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data" \
--save_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}"
python ./src/codebase/save_text_reps.py \
--seed=0 \
--dataset="Waterbirds" \
--clip_vision_encoder="ViT-B/32" \
--save_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}" \
--prompt_sent_type="captioning" \
--captioning_type="gpt-4o" \
--prompt_csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/data/waterbirds/va_metadata_waterbirds_captioning_GPT.csv"
python ./src/codebase/learn_aligner.py \
--seed=0 \
--epochs=30 \
--dataset="Waterbirds" \
--save_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{0}/clip_img_encoder_ViT-B/32" \
--clf_reps_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{0}/clip_img_encoder_ViT-B/32/{1}_classifier_embeddings.npy" \
--clip_reps_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{0}/clip_img_encoder_ViT-B/32/{1}_clip_embeddings.npy"
python ./src/codebase/discover_error_slices.py \
--seed=0 \
--topKsent=200 \
--dataset="Waterbirds" \
--save_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32" \
--clf_results_csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/test_additional_info.csv" \
--clf_image_emb_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/test_classifier_embeddings.npy" \
--language_emb_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/sent_emb_captions_gpt-4o.npy" \
--sent_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/sentences_captions_gpt-4o.pkl" \
--aligner_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/aligner_30.pth"
python ./src/codebase/validate_error_slices_w_LLM.py \
--seed=0 \
--LLM="gpt-4o" \
--dataset="Waterbirds" \
--class_label="landbirds" \
--clip_vision_encoder="ViT-B/32" \
--key="<open-ai key>" \
--top50-err-text="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/landbirds_error_top_200_sent_diff_emb.txt" \
--save_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32" \
--clf_results_csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/{}_additional_info.csv" \
--clf_image_emb_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/{}_classifier_embeddings.npy" \
--aligner_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/aligner_30.pth"
python ./src/codebase/mitigate_error_slices.py \
--seed=0 \
--epochs=9 \
--lr=0.001 \
--weight_decay=0.0001 \
--n=600 \
--mode="last_layer_finetune" \
--dataset="Waterbirds" \
--classifier="resnet_sup_in1k" \
--slice_names="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/{}_prompt_dict.pkl" \
--classifier_check_pt="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/model.pkl" \
--save_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32" \
--clf_results_csv="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/{}_{}_dataframe_mitigation.csv" \
--clf_image_emb_path="/restricted/projectnb/batmanlab/shawn24/PhD/Ladder/out/Waterbirds/resnet_sup_in1k_attrNo/Waterbirds_ERM_hparams0_seed{}/clip_img_encoder_ViT-B/32/{}_classifier_embeddings.npy"
Refer to notebook1 and notebook2 for qualitative results of the error slices discovered by LADDER.
We provide runnable shell scripts to replicate the full LADDER pipeline across all datasets:
If you find this work useful, please cite our paper:
@article{ghosh2024ladder,
title={LADDER: Language Driven Slice Discovery and Error Rectification},
author={Ghosh, Shantanu and Syed, Rayan and Wang, Chenyu and Poynton, Clare B and Visweswaran, Shyam and Batmanghelich, Kayhan},
journal={arXiv preprint arXiv:2408.07832},
year={2024}
}
Licensed under the Creative Commons Attribution 4.0 International
Copyright Β© Batman Lab, 2025
For any queries, contact Shantanu Ghosh (email: [email protected])
Did you try some other classifier on a new dataset and want to report the results? Feel free to send a pull request.