This repository contains the source code associated with Coresets from Trajectories: Selecting Data via Correlation of Loss Differences, TMLR 2025. This code has most recently been tested with Python 3.12 and Pytorch 2.3.1
Deep learning models achieve state-of-the-art performance across domains but face scalability challenges in real-time or resource-constrained scenarios.
To address this, we propose Correlation of Loss Differences (
Clone this repository using: git clone https://github.com/manishnagaraj/CDL_correlation_of_loss_differences.git
Create a conda environment using the environment.yml file: conda env create -f environment.yml
Activate conda environment conda activate cld
You can also manually create an environment ensuring the following packages are installed
- python (3.12)
- pytorch (2.3)
- fire
- numpy
- pandas
- torchvision
- tqdm
Default behavior (post-epoch eval on train):
python get_loss_values.py --data_path <DATA_DIR> \
--dataset CIFAR100 --model_arch resnet18 python Compute_CLD_scores.py --loss_path <PATH_TO_/Scores/..._losses.pickle>python train_on_coresets.py --score_path <PATH_TO_CDL_PICKLE> --samples_per_class <k># Get_loss_values.py defaults (loss collection)
data_path: str = './Data',
dataset: str = 'CIFAR100',
model_arch: str = 'resnet18',
workers: int = 4,
epochs: int = 164,
start_epoch: int = 0,
batch_size: int = 128,
test_batch_size: int = 256,
val_split_ratio: float = 0.1,
learning_rate: float = 0.1,
momentum: float = 0.9,
weight_decay: float = 5e-4,
disable_nesterov: bool = False,
schedule: List[int] = [81, 121],
gamma: float = 0.1,
checkpoint_path: str = './Data/Checkpoint',
logpath: str = './Logs',
resume_path: str = '',
manual_seed: int = 1234,
evaluate_only: bool = FalseFor baselines we utilized and followed the Deepcore repository
If you find this code useful in your research, please consider citing our main paper: Nagaraj, Manish, Deepak Ravikumar, and Kaushik Roy. "Coresets from Trajectories: Selecting Data via Correlation of Loss Differences." Transactions on Machine Learning Research (2025).
@article{
nagaraj2025coresets,
title={Coresets from Trajectories: Selecting Data via Correlation of Loss Differences},
author={Manish Nagaraj and Deepak Ravikumar and Kaushik Roy},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=QY0pbZTWJ9},
note={}
}
Manish Nagaraj, Deepak Ravikumar, Kaushik Roy
All authors are with Purdue University, West Lafayette, IN, USA
This work was supported in part by the Center for the Co-Design of Cognitive Systems (CoCoSys), a DARPA-sponsored JUMP 2.0 center, the Semiconductor Research Corporation (SRC), the National Science Foundation, and Collins Aerospace. We are also thankful to Efstathia Soufleri, Utkarsh Saxena, Amitangshu Mukherjee, and Sakshi Choudhary for their helpful discussions and feedback.