This repository provides an implementation of Difficulty and Uncertainty-Aware Lightweight (DUAL) data pruning, along with other data pruning algorithms particularly suitable for the Gaudi environment.
For more details, check out our paper on arXiv
Please refer to each folder for dataset-specific experiments:
exp_cifarfor CIFAR (10, 100) experimentsexp_imagenetfor ImageNet experiments
DUAL pruning enables efficient dataset pruning without requiring full training on the original dataset while achieving SOTA performance.
- Left: Test accuracy comparison on CIFAR-10 dataset under different pruning ratios.
- Right: Test accuracy comparison on CIFAR-100 dataset under different pruning ratios.
- The color represents the total computation time, including the time spent training the original dataset for score calculation for each pruning method. Blue indicates lower computation time, while red indicates higher computation time. Our method demonstrates its ability to minimize computation time while maintaining SOTA performance.
- Forgetting https://arxiv.org/abs/1812.05159
- EL2N https://arxiv.org/abs/2107.07075
- AUM https://arxiv.org/abs/2001.10528
- CCS https://arxiv.org/abs/2210.15809
- Entropy https://arxiv.org/abs/1906.11829
- Dyn-Unc https://arxiv.org/abs/2306.05175
- TDDS https://arxiv.org/abs/2311.13613
- DUAL (ours) https://arxiv.org/abs/2502.06905
| NVIDIA A6000 | intel Gaudi-v2 (Lazy) | |
|---|---|---|
| CIFAR (Full) | 37m 14s | 32m 14s |
| ImageNet (Full) | 35h 20m 38s | 19h 54m 1s |
@article{cho2025lightweight,
title={Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty},
author={Cho, Yeseul and Shin, Baekrok and Kang, Changmin and Yun, Chulhee},
journal={arXiv preprint arXiv:2502.06905},
year={2025}
}