Skip to content

SwayStar123/SpeedrunDiT

Repository files navigation

image Images generated by SR-DiT-B/1 (140M parameter diffusion model) with 400k training steps

SpeedrunDiT (SR-DiT): Speedrunning ImageNet Diffusion

This repository contains the reference implementation for SR-DiT (Speedrun Diffusion Transformer), a framework that combines representation alignment (REG-style), token routing (SPRINT), architectural improvements, and training modifications on top of a SiT-B/1 backbone with the INVAE tokenizer.

Links:

Highlights

  • ImageNet-256 (400K iters, no CFG): FID 3.49, KDD 0.319, 140M params, sampling at NFE=250
  • ImageNet-512 (400K iters, no CFG): FID 4.23, KDD 0.306, sampling at NFE=250

SR-DiT builds on top of a strong baseline (REG + INVAE) and then progressively adds:

  • Semantic latent space via E2E-INVAE
  • SPRINT token routing
  • RMSNorm, RoPE, QK normalization, value residual learning
  • Contrastive Flow Matching (CFM)
  • Time shifting and balanced label sampling (for evaluation)

Repository layout

  • train.py: training loop (Accelerate)
  • generate.py: multi-GPU sampling to .png and .npz
  • evaluations/evaluator.py: computes FID/sFID/IS/Precision/Recall from .npz
  • preprocessing/dataset_tools.py: ImageNet preprocessing + INVAE encoding
  • train.sh, eval.sh: example scripts used for our runs

Setup

Create an environment (python 3.11) and install dependencies:

pip install -r requirements.txt

Dataset

Training expects a directory (passed via --data-dir) containing:

dataset/
  images/            # preprocessed ImageNet images (256x256 or 512x512)
  vae-in/            # INVAE latents (.npy) + dataset.json labels

Follow the preprocessing guide in preprocessing/README.md. The minimal flow is:

# 1) Convert raw ImageNet to resized/cropped PNG dataset
python preprocessing/dataset_tools.py convert --source /path/to/imagenet/train \
  --dest dataset/images --resolution=256x256 --transform=center-crop-dhariwal

# 2) Encode images to INVAE latents
python preprocessing/dataset_tools.py encode --source dataset/images \
  --dest dataset/vae-in

Preprocessed dataset is also uploaded here:

https://huggingface.co/datasets/SwayStar123/repa-imagenet-256/blob/main/dataset.zip
https://huggingface.co/datasets/SwayStar123/repa-imagenet-256/blob/main/vae-in.zip

You must first unzip the dataset.zip file, and then unzip the vae-in.zip inside the newly created dataset folder

Training

An example command is provided in train.sh:

bash train.sh

Key arguments:

  • --model: use SiT-B/1 for the SR-DiT-B/1 configuration
  • --data-dir: directory containing images/ and vae-in/
  • --qk-norm: enables QK normalization
  • --cfm-coeff, --cfm-weighting: CFM settings
  • --time-shifting, --shift-base: time shifting for training

Checkpoints are written to:

exps/<exp-name>/checkpoints/<step>.pt

Sampling and evaluation

eval.sh runs sampling (generate.py) and then computes metrics (evaluations/evaluator.py).

bash eval.sh

Notes:

  • generate.py currently supports --mode sde (the ode branch is not implemented).
  • For metric computation, download the matching reference batch listed in evaluations/README.md.
  • Balanced label sampling can be enabled via --balanced-sampling when generating samples.

Citation

If you use this repository, please cite SR-DiT:

@misc{bhanded2025speedrundit,
  title         = {Speedrunning ImageNet Diffusion},
  author        = {Bhanded, Swayam},
  year          = {2025},
  eprint        = {2512.12386},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2512.12386},
}

Contact

Please open a GitHub issue for any questions or issues.

Acknowledgements

This codebase builds upon:

  • REG / REPA
  • SiT
  • DINOv2
  • ADM evaluations
  • NVLabs edm2 preprocessing utilities

We gratefully acknowledge support from WayfarerLabs (Open World Labs) for sponsoring compute resources used in this work.

About

SR-DiT Speedrunning ImageNet Diffusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published