This file maps paper-facing runs and figures to their expected output paths after running the pipeline.
These are the exact benchmark definitions used by the script runner:
- configs/benchmarks/cifar10_baseline.yaml
- configs/benchmarks/cifar10_aof.yaml
- configs/benchmarks/cifar10_tl.yaml
- configs/benchmarks/cifar100_baseline.yaml
- configs/benchmarks/cifar100_aof.yaml
- configs/benchmarks/cifar100_tl.yaml
- configs/benchmarks/gtsrb_baseline.yaml
- configs/benchmarks/gtsrb_tl.yaml
- configs/benchmarks/purchase100_baseline.yaml
- configs/benchmarks/purchase100_aof.yaml
The paper used the following post-analysis runs:
- analysis_results/cifar10/resnet18/seed1
- analysis_results/cifar10/resnet18/seed2
- analysis_results/cifar10/resnet18/seed3
- analysis_results/cifar10/resnet18/seed4
- analysis_results/cifar10/resnet18/seed5
- analysis_results/cifar10/resnet18/seed6
- analysis_results/cifar10/resnet18/seed7
- analysis_results/cifar10/resnet18/seed8
- analysis_results/cifar10/resnet18/seed9
- analysis_results/cifar10/resnet18/seed10
- analysis_results/cifar10/resnet18/seed11
- analysis_results/cifar10/resnet18/seed12
- analysis_results/cifar10/resnet18/mixup_drp0.15
mapped as
+1 different (MixUp, DRP=0.15) - analysis_results/cifar10/resnet18/bs512_drp0.2
mapped as
+1 different (BS=512, DRP=0.2) - analysis_results/cifar10/resnet18/tl
mapped as
+1 different (TL) - analysis_results/cifar10/wrn28-2/seed42
mapped as
+1 different (Arch)
- script: comprehensive_analysis/threshold_distribution.py
- inputs:
per_model_metrics_two_modes.csvfromseed1toseed12 - outputs:
- script: comprehensive_analysis/reproducibility_analysis.py
- analysis-result inputs:
seed1toseed12mixup_drp0.15bs512_drp0.2tlwrn28-2/seed42
- experiment-array inputs for rank stability:
- experiments/cifar10/resnet18/seed1
- experiments/cifar10/resnet18/seed2
- experiments/cifar10/resnet18/seed3
- experiments/cifar10/resnet18/seed4
- experiments/cifar10/resnet18/seed5
- experiments/cifar10/resnet18/seed6
- experiments/cifar10/resnet18/seed7
- experiments/cifar10/resnet18/seed8
- experiments/cifar10/resnet18/seed9
- experiments/cifar10/resnet18/seed10
- experiments/cifar10/resnet18/seed11
- experiments/cifar10/resnet18/seed12
- per-run grids are generated by comprehensive_analysis/run_analysis.py
- collected into one folder by comprehensive_analysis/compose_top_vulnerable.py
- collected output:
analysis_results/figures/topk_vulnerable_images/<dataset>/<arch>/<run>/ - paper panel assembled manually from the collected images
- script: comprehensive_analysis/loss_ratio_tpr.py
- input:
- analysis_results/loss_ratio.csv (assembled manually from per-run summaries)
- output:
- script: comprehensive_analysis/plot_benchmark_distribution.py
- explicit panel manifests:
- output:
Figure 8 note:
- the exact source experiment directories are recorded in the panel manifests
- Figure 8 requires running all 10 benchmarks; a subset produces a partial panel