Weakly Supervised Shortcut Learning Mitigation Using Sparse AutoEncoders

Code and data for the paper "Weakly Supervised Shortcut Learning Mitigation Using Sparse AutoEncoders". Find the full paper + appendix here.

The repository is organized into model-specific directories.
Each folder (e.g., ResNet_on_ISIC/) contains:

training scripts
testing/evaluation scripts when sparse muting applied
utility functions used in the experiments

We have also uploaded the metadata files used in our experiments like Split-Metadata_WB for WaterBirds dataset and Split-Metadata_ISIC for ISIC dataset.
These files provide the labels and additional information required to fully reproduce our results.

Overview

By projecting the model embedding into a sparse space we can distintangle polysemantic neurons, seperating spurious and core features into different dimentions. We then correlate each sparse space neuron activations with shortcut presence to identify and mute the neurons that encode the shortcut. Resulting in shortcut mitigation without need for full group annotation or model retraining.

Running

Split_Metadata_ISIC and Split_Metadata_WB contain the exact training, validation, and test split used in our experiments.
ResNet_on_WB, ResNet_on_ISIC, AlexNet_on_WB, and AlexNet_on_ISIC contain the code for training and testing the SAE + correlation based muting

Citation

@InProceedings{WeaklySupervised_2026,
  author    = {Muhammad Ahsan, Despina Tawadros, Sari Sadiya, Phuong Quynh Le, Jorg Schlotterer, Christin Seifert, and Gemma Roig},
  title     = {Weakly Supervised Shortcut Learning Mitigation Using Sparse Autoencoders}, 
  booktitle = {ArXiv},
  month     = {January},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
AlexNet_on_ISIC		AlexNet_on_ISIC
AlexNet_on_WB		AlexNet_on_WB
ResNet_on_ISIC		ResNet_on_ISIC
ResNet_on_WB		ResNet_on_WB
Split-Metadata_WB/output_metadata		Split-Metadata_WB/output_metadata
Split_Metadata_ISIC		Split_Metadata_ISIC
Full_paper_SAE_shortcut_mitigation.pdf		Full_paper_SAE_shortcut_mitigation.pdf
README.md		README.md
SparseMuting_method.pdf		SparseMuting_method.pdf
SparseMuting_method.png		SparseMuting_method.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Weakly Supervised Shortcut Learning Mitigation Using Sparse AutoEncoders

Overview

Running

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Arsu-Lab/Weakly-Supervised-Shortcut-Mitigation-Sparse-AutoEncoders

Folders and files

Latest commit

History

Repository files navigation

Weakly Supervised Shortcut Learning Mitigation Using Sparse AutoEncoders

Overview

Running

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages