Peng Sun1,2 · Yi Jiang2 · Tao Lin1
1Westlake University 2Zhejiang University
Official PyTorch implementation of UCGM Trainer and Sampler (UCGM-{T,S}): 🏆 A unified framework for training, sampling, and understanding continuous generative models (including diffusion, flow-matching, consistency models).


Generated samples from two 675M diffusion transformers trained with UCGM on ImageNet-1K 512×512.
Left: A multi-step model (Steps=NFE=40, FID=1.48) | Right: A few-step model (Steps=NFE=2, FID=1.75)
Samples generated without classifier-free guidance or other guidance techniques.
🚀 Plug-and-Play Acceleration: UCGM-S boosts various pre-trained multi-step continuous models for free—e.g., given a model from REPA-E (on ImageNet 256×256):
-
✅ Cuts 84% of sampling steps (Steps=250 → Steps=40) while improving FID (1.26 → 1.06)
✅ Training-free and no additional cost introduced -
📊 Extended results for more accelerated models are available here.
⚡ Lightning-Fast Model Tuning: UCGM-T transforms any pre-trained multi-step continuous model (e.g., REPA-E with FID=1.54 at NFE=80) into a high-performance, few-step generator with record efficiency:
-
✅ FID=1.39 @ Steps=NFE=2 (ImageNet-1K 256×256)
✅ Tuned in just 8 minutes on 8 GPUs -
📊 Extended results for additional tuned models are available here.
🔥 Efficient Unified Framework: Train/sample diffusion, flow-matching, and consistency models in one system, outperforming peers at low steps:
-
✅ FID=1.21 @ Steps=NFE=30 (ImageNet 256×256), 1.48 FID @ Steps=NFE=40 on 512×512
✅ Just 2 steps? Still strong (1.42 FID on 256×256, 1.75 FID on 512×512)
✅ No classifier-free guidance or other techniques—simpler and faster
✅ Compatible with diverse datasets (ImageNet, CIFAR, etc.) and architectures (CNNs, Transformers)—high flexibility -
📊 Extended results for additional trained models are available here.
📖 Check more detailed features in our paper!
-
Download necessary files from Huggingface, including:
- Checkpoints of various VAEs
- Statistic files for datasets
- Reference files for FID calculation
-
Place the downloaded
outputs
andbuffers
folders at the same directory level as thisREADME.md
-
For dataset preparation (skip if not training models), run:
bash scripts/data/in1k256.sh
Accelerate any continuous generative model (diffusion, flow-matching, etc.) with UCGM-S. Results marked with 🚀 denote UCGM-S acceleration.
NFE = Number of Function Evaluations (sampling computation cost)
Method | Model Size | Dataset | Resolution | NFE | FID | NFE (🚀) | FID (🚀) | Model |
---|---|---|---|---|---|---|---|---|
REPA-E | 675M | ImageNet | 256×256 | 250×2 | 1.26 | 40×2 | 1.06 | Link |
Lightning-DiT | 675M | ImageNet | 256×256 | 250×2 | 1.35 | 50×2 | 1.21 | Link |
DDT | 675M | ImageNet | 256×256 | 250×2 | 1.26 | 50×2 | 1.27 | Link |
EDM2-S | 280M | ImageNet | 512×512 | 63 | 2.56 | 40 | 2.53 | Link |
EDM2-L | 778M | ImageNet | 512×512 | 63 | 2.06 | 50 | 2.04 | Link |
EDM2-XXL | 1.5B | ImageNet | 512×512 | 63 | 1.91 | 40 | 1.88 | Link |
DDT | 675M | ImageNet | 512×512 | 250×2 | 1.28 | 150×2 | 1.18 | Link |
💻 Usage Examples: Generate images and evaluate FID using a REPA-E trained model:
# Generate samples using public pretrained multi-step model
bash scripts/run_eval.sh ./configs/sampling_multi_steps/in1k256_sit_xl_repae_linear.yaml
UCGM-T revolutionizes multi-step generative models (including diffusion and flow matching models) by enabling ultra-efficient conversion to high-performance few-step versions. Results marked with ⚡ indicate UCGM-T-tuned models.
Pre-trained Model | Model Size | Dataset | Resolution | Tuning Efficiency | NFE (⚡) | FID (⚡) | Tuned Model |
---|---|---|---|---|---|---|---|
Lightning-DiT | 675M | ImageNet | 256×256 | 0.64 epoch (10 mins) | 2 | 2.06 | Link |
REPA | 675M | ImageNet | 256×256 | 0.64 epoch (13 mins) | 2 | 1.95 | Link |
REPA-E | 675M | ImageNet | 256×256 | 0.40 epoch (8 mins) | 2 | 1.39 | Link |
DDT | 675M | ImageNet | 256×256 | 0.32 epoch (11 mins) | 2 | 1.90 | Link |
(Please note that the tuning time mentioned above is based on evaluation using 8 H800 GPUs)
💻 Usage Examples
Generate Images:
# Generate samples using our tuned few-step model
bash scripts/run_eval.sh ./configs/tuning_few_steps/in1k256_sit_xl_repae_linear.yaml
Tune Models:
# Tune a multi-step model into few-step version
bash scripts/run_train.sh ./configs/tuning_few_steps/in1k256_sit_xl_repae_linear.yaml
Train multi-step and few-step models (diffusion, flow-matching, consistency) with UCGM-T. All models sample efficiently using UCGM-S without guidance.
Encoders | Model Size | Resolution | Dataset | NFE | FID | Model |
---|---|---|---|---|---|---|
VA-VAE | 675M | 256×256 | ImageNet | 30 | 1.21 | Link |
VA-VAE | 675M | 256×256 | ImageNet | 2 | 1.42 | Link |
DC-AE | 675M | 512×512 | ImageNet | 40 | 1.48 | Link |
DC-AE | 675M | 512×512 | ImageNet | 2 | 1.75 | Link |
💻 Usage Examples
Generate Images:
# Generate samples using our pretrained few-step model
bash scripts/run_eval.sh ./configs/training_few_steps/in1k256_tit_xl_vavae.yaml
Train Models:
# Train a new multi-step model (full training)
bash scripts/run_train.sh ./configs/training_multi_steps/in1k512_tit_xl_dcae.yaml
# Convert to few-step model (requires pretrained multi-step checkpoint)
bash scripts/run_train.sh ./configs/training_few_steps/in1k512_tit_xl_dcae.yaml
❗ Note for few-step training:
- Requires initialization from a multi-step checkpoint
- Prepare your checkpoint file with both
model
andema
keys:{ "model": multi_step_ckpt["ema"], "ema": multi_step_ckpt["ema"] }
If you find this repository helpful for your project, please consider citing our work:
@article{sun2025unified,
title = {Unified continuous generative models},
author = {Sun, Peng and Jiang, Yi and Lin, Tao},
journal = {arXiv preprint arXiv:2505.07447},
year = {2025},
url = {https://arxiv.org/abs/2505.07447},
archiveprefix = {arXiv},
eprint = {2505.07447},
primaryclass = {cs.LG}
}
Apache License 2.0 - See LICENSE for details.