Code and data for the paper "Weakly Supervised Shortcut Learning Mitigation Using Sparse AutoEncoders". Find the full paper + appendix here.
The repository is organized into model-specific directories.
Each folder (e.g., ResNet_on_ISIC/) contains:
- training scripts
- testing/evaluation scripts when sparse muting applied
- utility functions used in the experiments
We have also uploaded the metadata files used in our experiments like Split-Metadata_WB for WaterBirds dataset and Split-Metadata_ISIC for ISIC dataset.
These files provide the labels and additional information required to fully reproduce our results.
By projecting the model embedding into a sparse space we can distintangle polysemantic neurons, seperating spurious and core features into different dimentions. We then correlate each sparse space neuron activations with shortcut presence to identify and mute the neurons that encode the shortcut. Resulting in shortcut mitigation without need for full group annotation or model retraining.
Split_Metadata_ISICandSplit_Metadata_WBcontain the exact training, validation, and test split used in our experiments.ResNet_on_WB,ResNet_on_ISIC,AlexNet_on_WB, andAlexNet_on_ISICcontain the code fortrainingandtestingthe SAE + correlation based muting
@InProceedings{WeaklySupervised_2026,
author = {Muhammad Ahsan, Despina Tawadros, Sari Sadiya, Phuong Quynh Le, Jorg Schlotterer, Christin Seifert, and Gemma Roig},
title = {Weakly Supervised Shortcut Learning Mitigation Using Sparse Autoencoders},
booktitle = {ArXiv},
month = {January},
year = {2026}
}
