Welcome to the XAI-Units package repository! This is a library to help benchmark and compare explainable AI feature attribution methods. It contains a collection of datasets and models with specific units of behaviour that are known to be challenging for feature attribution methods. The library also contains an end-to-end pipeline for applying feature attribution methods across a range of datasets/models, scoring them with metrics, then summarising the results.
Please also check out the associated paper "XAI-Units: Benchmarking Explainability Methods with Unit Tests" and visit our documentation page for additional information.
├── data\dinosaur_images
├── demo
│ ├── tutorials
│ └── Example
├── docs
└── src\xaiunits
├── datagenerator
├── methods
├── metrics
├── model
├── pipeline
└── trainer- Clone the repo.
- Create a virtual environment.
python -m venv ./venv
- Activate the virtual environment then navigate to the root of the repo.
- Use the requirements.txt file to pip install the requirements.
pip install -r requirements.txt
- You may wish to upgrade the installed version of pytorch for GPU support. (Official benchmark models were trained on pytorch-cuda=12.1).
- Install this library as a local editable package. Note the full stop . at the end of the command refers to the current directory i.e. the root of the repo
pip install -e .
This folder contains real world example of experiments using our library that reader can reference. Follow the steps below:
- Step 1: Follow the installation guide above so you have an active venv with the required packages installed.
- Step 2: Change the root folder to demo/example folder to ensure that relative file paths are intact,
cd ./demo/example - Step 3: select the appropriate Python script to reproduce experiment of choice (e.g.
python3 tabular.py)- Tabular Experiment : tabular.py
- DeepLIFT Supplementary Experiment : deeplift_suppl_exp.py
- Image Experiment CNN : image.py
- Image Experiment ViT : image.py
- Image Experiment Text : image.py
Here, we present a practical, end-to-end example that demonstrates how to effectively utilize one of the library's features. The Pipeline is one of the main features of the library, bringing a simple, straightforward way to write end-to-end experiments with feature attribution explainability methods.
Necessary imports to run this code:
from xaiunits.model import ContinuousFeaturesNN
from xaiunits.datagenerator import WeightedFeaturesDataset
from captum.metrics import sensitivity_max, infidelity
from captum.attr import InputXGradient, IntegratedGradients, Lime
from xaiunits.pipeline import Pipeline
from xaiunits.metrics import perturb_standard_normal, wrap_metricSelect one of the multiple datasets in the library
dataset = WeightedFeaturesDataset()Select a model compatible with the dataset
model = ContinuousFeaturesNN(n_features=dataset.n_features, weights=dataset.weights)
# alternatively use model = dataset.generate_model()Add explainability methods of your choice to the list
methods = [InputXGradient, IntegratedGradients, Lime]Add the metrics you want to use to the list, using wrap_metric
metrics = [
wrap_metric(sensitivity_max),
wrap_metric(infidelity, perturb_func=dataset.perturb_function(), normalize=True),
]You can add as many models as you want to the list for the Pipeline to run
models = [model]Add as many datasets as you want to the list. Make sure models and datasets are compatible with each other
datasets = [dataset]Create the pipeline
pipeline = Pipeline(models, datasets, methods, metrics, method_seeds=[10])Use the features of the Pipeline
results = pipeline.run() # apply the explanation methods and evaluate them
results.print_stats() # print results of the explainability methods and the metrics
df = results.data # access the full dataframe of resultsTo expand more on the usage of models, datasets, methods, and metrics available, as well as other features the library has, such as the Autotrainer and the ExperimentPipeline, refer to the demo/tutorials folder and to the documentation.
custom_methods_and_custom_datasets.ipynb
The documentation uses Sphinx. For a local build of the documentation, ensure that requirements.txt has been installed (including Sphinx) then navigate to the docs folder and run the following command:
make htmlThen access the documentation by opening the file docs/_build/html/index.html.
- WeightedFeaturesDataset: data_generation.py
- ConflictingDataset: conflicting.py
- PertinentNegativesDataset: pertinent_negatives.py
- ShatteredGradientsDataset: shattered_grad.py
- InteractingFeatureDataset: interacting_features.py
- UncertaintyAwareDataset: uncertainty_aware.py
- BooleanDataset: boolean.py
- BalancedImageDataset: image_generation.py
- ImbalancedImageDataset: image_generation.py
- TextDataset: text_dataset.py
- DynamicNN: dynamic.py
- GenericNN: generic.py
- ContinuousFeaturesNN: continuous.py
- ConflictingFeaturesNN: conflicting.py
- PertinentNN: pertinent_negatives.py
- ShatteredGradientsNN: shattered_gradients.py
- InteractingFeaturesNN: interaction_features.py
- UncertaintyNN: uncertainty_model.py
- PropFormulaNN: boolean.py
- Pipeline: pipeline.py
- ExperimentPipeline: pipeline.py
- AutoTrainer: trainer.py
If you find our paper or code useful in your research, please consider citing the original work:
@inproceedings{10.1145/3715275.3732186,
author = {Lee, Jun Rui and Emami, Sadegh and Hollins, Michael David and Wong, Timothy C. H. and Villalobos S\'{a}nchez, Carlos Ignacio and Toni, Francesca and Zhang, Dekai and Dejl, Adam},
title = {XAI-Units: Benchmarking Explainability Methods with Unit Tests},
year = {2025},
isbn = {9798400714825},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3715275.3732186},
doi = {10.1145/3715275.3732186},
booktitle = {Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency},
pages = {2892–2905},
numpages = {14},
keywords = {explainable AI, feature attribution, neural networks, synthetic data, synthetic models, unit testing},
location = {
},
series = {FAccT '25}
}
