Welcome to the official repository for DINA (DEEL ImageNet Attributions) — a dataset and benchmark framework for evaluating explanation methods on models that have been trained on ImageNet.
This repository provides:
- 📦 Code to generate attribution maps on ImageNet
- 🧪 Scripts to compute explanation metrics (Fidelity, Complexity, Randomization)
- 🧠 A curated dataset of precomputed attributions hosted in a public GCS bucket
📂 All attribution files are available at:
https://storage.cloud.google.com/xai-deel
🧭 Attribution Coverage Table (Click to expand)
This dataset includes attribution maps for all models and explainers listed below:
| Model | Saliency | GradCAM | GradCAMPP | VarGrad | SmoothGrad | SquareGrad | IntegratedGradients | Rise | GradientInput | KernelShap | Occlusion | Lime | HsicAttributionMethod | SobolAttributionMethod |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BeitV2 | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ConvNeXtV2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DinoV2 | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| EfficientNetV2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| InceptionNeXt | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| MaxVIT | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
| MLPMixer | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ResNest50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ResNet50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
git clone git@github.com:deel-ai/dina.git
cd dinapython3.11 -m venv dina_env
source dina_env/bin/activate
pip install -e .- You need to download the ImageNet dataset.
- The validation labels must be downloaded from the tensorflow research models repo.
- The validation labels must be placed in the ILSVRC/Data/CLS-LOC folder.
- Structure the Validation Set
PYTHONPATH=. python scripts/structure_val_imagenet.py --raw_dir "path/to/imagenet/ILSVRC/Data/CLS-LOC"Generate imagenet_data.json
We use Keras CV Attention Models (Kecam) to load vision models. You’ll need to create a metadata file:
PYTHONPATH=. python scripts/kecam_custom_dataset_script.py \
--train_images path/to/imagenet/ILSVRC/Data/CLS-LOC/train \
--test_images /path/to/imagenet/ILSVRC/Data/CLS-LOC/val \
-s imagenet_dataPlace the resulting imagenet_data.json in the root of the repo.
Example to run IntegratedGradients on DinoV2:
PYTHONPATH=. python scripts/compute_explanations.py \
--model_name "DinoV2" \
--explainer_name "IntegratedGradients" \
--batch_size 32 \
--imagenet_json imagenet_data.json \
--output_dir /dir/to/bench_dir/📍 See: deel/dina/utils/models.py
📍 See: deel/dina/utils/explainers.py
Attributions are saved at:
{output_dir}/{model_name}/{explainer_name}/explanations.tfrecordA preprocessed dataset (~30GB) will also be generated and saved to:
{output_dir}/{model_name}/preprocess_dataset.tfrecordNote
If you run explanations with the same model but another explainer it will directly reload the preprocess_dataset!
PYTHONPATH=. python scripts/compute_fidelity.py \
--model_name "DinoV2" \
--explainer_name "IntegratedGradients" \
--metrics "Insertion" "Deletion" \
--batch_size 32 \
--imagenet_json imagenet_data.json \
--output_dir /dir/to/bench_dir/📍 See: deel/dina/metrics/fidelity.py
PYTHONPATH=. python scripts/compute_complexity.py \
--model_name "DinoV2" \
--explainer_name "IntegratedGradients" \
--metrics "Sparseness" "Complexity" \
--batch_size 32 \
--imagenet_json imagenet_data.json \
--output_dir /dir/to/bench_dir/📍 See: deel/dina/metrics/complexity.py
PYTHONPATH=. python scripts/compute_randomization.py \
--model_name "DinoV2" \
--explainer_name "IntegratedGradients" \
--metric_name "ModelRandomizationMetric05" \
--batch_size 32 \
--imagenet_json imagenet_data.json \
--output_dir /dir/to/bench_dir/📍 See: deel/dina/metrics/randomization.py
metric_name options: "ModelRandomizationMetric05", "ModelRandomizationMetric01", "RandomLogitMetric"
Note
Due to the fact we randomize the model, we cannot run several metrics at once for those ones.
For attributions that you have downloaded from our bucket (or that you have computed) you can load them as follow:
from deel.dina.utils import create_explanations_dataset_from_tfrecord
explanations_path = "/path/to/explanations.tfrecord"
explanations_ds = create_explanations_dataset_from_tfrecord(explanations_path)To get the input-label-explanation triplet:
Important
This will create a tfrecord file at output_dir which will be aroung 30Go
from deel.dina.utils import get_model, generate_or_load_preprocess_ds
import tensorflow as tf
model = get_model("BeitV2")
preprocess_dataset = generate_or_load_preprocess_ds(
model=model,
output_dir="path/to/preprocess_record_dir",
batch_size=32,
imagenet_json_path="imagenet_data.json"
)
combined_ds = tf.data.Dataset.zip((preprocess_dataset, explanations_ds))
combined_ds = combined_ds.map(lambda x, y: (x[0], x[1], y))
for pre_input, one_hot_pred, explanation in combined_ds.batch(8):
# Your logic here
break