This repository is an official implementation of "SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformer".
TL;DR We introduce SPRINT, a simple and general framework that enables training diffusion transformers with aggressive token dropping (up to 75%) and minimal architectural modification, while preserving representation quality. Notably, on ImageNet-1K 256x256, SPRINT achieves upto 9.8× training savings with comparable or superior FID/FDD. Furthermore, during inference, our Path-Drop Guidance (PDG) nearly halves inference FLOPs compared to standard CFG sampling while improving quality.
- Release training code.
- Release inference (sampling) code.
- Release the pre-trained model. It will be released soon!
| Model | Res. | Epoch | FDD (PDG) | FID (PDG) | FDD (CFG) | FID (CFG) |
|---|---|---|---|---|---|---|
| SiT-XL/2 + SPRINT | 256 | 400 | 58.4 | 1.62 | 75.4 | 1.96 |
| SiT-XL/2 + SPRINT + REPA | 256 | 400 | 54.7 | 1.59 | 75.6 | 1.87 |
| SiT-XL/2 + SPRINT | 512 | 400 | 46.9 | 1.96 | 53.6 | 2.23 |
To install requirements, run:
git clone https://github.com/snap-research/Sprint.git
cd Sprint
conda create -n sprint python==3.12
conda activate sprint
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 xformers --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txtWe provide experiments for ImageNet (Download it from here). We follow the preprocessing guide from here.
You can modify the training configuration files in config/train:
encoder_depth: Depth of dense shallow layermiddle_depth: Depth of sparse deep layerdecoder_depth: Depth of final decoder layerresidual_type:concat_linearmask_ratio: Any ratio between 0 to 1mask_type:[random, structured_with_random_offset]representation_align:trueto enable the DINOv2 alignment loss (e.g., REPA)representation_depth: Any values between 1 to the depth of the model
Intermediate checkpoints and configuration files will be saved in the exps folder by default.
accelerate launch --multi_gpu --num_processes=8 train.py --config configs/train/SIT_XL_SPRINT_256.yamlaccelerate launch --multi_gpu --num_processes=8 train.py --config configs/train/SIT_XL_SPRINT_256_ft.yamlYou can modify the inference configuration in config/eval.
- Update the
ckpt_pathfield to point to your trained model or one of the provided checkpoints. - Generated samples will be saved to the
samplesfolder by default. - You can also enable our Path-Drop Guidance (PDG) by setting
path_drop_guidancetotruein the config file. PDG generates samples nearly 2× faster than vanilla CFG sampling, while also improving sample quality. - Feel free to tune the
cfg_scaleas desired.
accelerate launch --multi_gpu --num_processes=8 sample_ddp.py --config configs/eval/SiT_XL_SPRINT.yamlThis repo is built upon SiT and REPA.
If you find our work interesting, please consider giving a ⭐ and citation.
@article{park2025sprint,
title={Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers},
author={Park, Dogyun and Haji-Ali, Moayed and Li, Yanyu and Menapace, Willi and Tulyakov, Sergey and Kim, Hyunwoo J and Siarohin, Aliaksandr and Kag, Anil},
journal={arXiv preprint arXiv:2510.21986},
year={2025}
}