Peng Sun1,2 · Yi Jiang2 · Tao Lin1
1Westlake University 2Zhejiang University
Official PyTorch implementation of UCGM: A unified framework for training, sampling, and understanding continuous generative models (diffusion, flow-matching, consistency models).
📆 [2025/5/23] 🚀 Ultra-Fast Model Tuning Achieved!
-
✨ UCGM introduces lightning-fast model adaptation—transforming any pre-trained diffusion model (e.g., REPA-E with FID=1.54 at NFE=80) into a high-performance, few-step generator with record efficiency:
✅ Steps=NFE=2, FID=1.39 (ImageNet-1K 256×256)
✅ Tuned in only 8 minutes on just 8 GPUs -
📊 Extended results for additional tuned diffusion models are available here.
-
🔜 Stay tuned! The tuning code will be released soon.


Generated samples from two 675M diffusion transformers trained with UCGM on ImageNet-1K 512×512.
Left: A multi-step model (Steps=40, FID=1.48) | Right: A few-step model (Steps=2, FID=1.75)
Samples generated without classifier-free guidance or other guidance techniques.
- 🚀 Unified Framework: Train/sample diffusion, flow-matching, and consistency models in one system
- 🔌 Plug-and-Play Acceleration: UCGM-S boosts pre-trained models—e.g., given a model from REPA-E (on ImageNet 256×256), cuts 84% sampling steps (NFE=500 → NFE=80) while improving FID (1.26 → 1.06)
- 🥇 SOTA Performance: UCGM-T-trained models outperform peers at low steps (1.21 FID @ 30 steps on ImageNet 256×256, 1.48 FID @ 40 steps on 512×512)
- ⚡ Few-Step Mastery: Just 2 steps? Still strong (1.42 FID on 256×256, 1.75 FID on 512×512)
- 🚫 Guidance-Free: No classifier-free guidance for UCGM-T-trained models, simpler and faster
- 🏗️ Architecture & Dataset Flexibility: Compatible with diverse datasets (ImageNet, CIFAR, etc.) and VAEs/neural architectures (CNNs, Transformers)
- 📖 Check more features in our paper!
-
Download necessary files from Huggingface, including:
- Checkpoints of various VAEs
- Statistic files for datasets
- Reference files for FID calculation
-
Place the downloaded
outputs
andbuffers
folders at the same directory level as thisREADME.md
-
For dataset preparation (skip if not training models), run:
bash scripts/data/in1k256.sh
Accelerate any continuous generative model (diffusion, flow-matching, etc.) with UCGM-S. Results marked with ⚡ denote UCGM-S acceleration.
NFE = Number of Function Evaluations (sampling computation cost)
Method | Model Size | Dataset | Resolution | NFE | FID | NFE (⚡) | FID (⚡) | Model |
---|---|---|---|---|---|---|---|---|
REPA-E | 675M | ImageNet | 256×256 | 250×2 | 1.26 | 40×2 | 1.06 | Link |
Lightning-DiT | 675M | ImageNet | 256×256 | 250×2 | 1.35 | 50×2 | 1.21 | Link |
DDT | 675M | ImageNet | 256×256 | 250×2 | 1.26 | 50×2 | 1.27 | Link |
EDM2-S | 280M | ImageNet | 512×512 | 63 | 2.56 | 40 | 2.53 | Link |
EDM2-L | 778M | ImageNet | 512×512 | 63 | 2.06 | 50 | 2.04 | Link |
EDM2-XXL | 1.5B | ImageNet | 512×512 | 63 | 1.91 | 40 | 1.88 | Link |
DDT | 675M | ImageNet | 512×512 | 250×2 | 1.28 | 150×2 | 1.24 | Link |
💻 Usage Examples: Generate images and evaluate FID using a REPA-E trained model:
bash scripts/run_eval.sh ./configs/sampling_multi_steps/in1k256_sit_xl_repae_linear.yaml
UCGM-T revolutionizes multi-step generative models (including diffusion and flow matching models) by enabling ultra-efficient conversion to high-performance few-step versions. Results marked with ⚡ indicate UCGM-T-tuned models.
Pre-trained Model | Model Size | Dataset | Resolution | Tuning Efficiency | NFE (⚡) | FID (⚡) | Tuned Model |
---|---|---|---|---|---|---|---|
Lightning-DiT | 675M | ImageNet | 256×256 | 0.64 epoch (10 mins) | 2 | 2.06 | Link |
REPA | 675M | ImageNet | 256×256 | 0.64 epoch (13 mins) | 2 | 1.95 | Link |
REPA-E | 675M | ImageNet | 256×256 | 0.40 epoch (8 mins) | 2 | 1.39 | Link |
DDT | 675M | ImageNet | 256×256 | 0.32 epoch (11 mins) | 2 | 1.90 | Link |
(Please note that the tuning time mentioned above is based on evaluation using 8 H800 GPUs)
💻 Usage Examples
Generate Images:
# Generate samples using our tuned few-step model
bash scripts/run_eval.sh ./configs/tuning_few_steps/in1k256_sit_xl_repae_linear.yaml
Train multi-step and few-step models (diffusion, flow-matching, consistency) with UCGM-T. All models sample efficiently without guidance.
Encoders | Model Size | Resolution | Dataset | NFE | FID | Model |
---|---|---|---|---|---|---|
VA-VAE | 675M | 256×256 | ImageNet | 30 | 1.21 | Link |
VA-VAE | 675M | 256×256 | ImageNet | 2 | 1.42 | Link |
DC-AE | 675M | 512×512 | ImageNet | 40 | 1.48 | Link |
DC-AE | 675M | 512×512 | ImageNet | 2 | 1.75 | Link |
💻 Usage Examples
Generate Images:
# Generate samples using our pretrained few-step model
bash scripts/run_eval.sh ./configs/training_few_steps/in1k256_tit_xl_vavae.yaml
Train Models:
# Train a new multi-step model (full training)
bash scripts/run_train.sh ./configs/training_multi_steps/in1k512_tit_xl_dcae.yaml
# Convert to few-step model (requires pretrained multi-step checkpoint)
bash scripts/run_train.sh ./configs/training_few_steps/in1k512_tit_xl_dcae.yaml
❗ Note for few-step training:
- Requires initialization from a multi-step checkpoint
- Prepare your checkpoint file with both
model
andema
keys:{ "model": multi_step_ckpt["ema"], "ema": multi_step_ckpt["ema"] }
If you find this repository helpful for your project, please consider citing our work:
@article{sun2025unified,
title = {Unified continuous generative models},
author = {Sun, Peng and Jiang, Yi and Lin, Tao},
journal = {arXiv preprint arXiv:2505.07447},
year = {2025},
url = {https://arxiv.org/abs/2505.07447},
archiveprefix = {arXiv},
eprint = {2505.07447},
primaryclass = {cs.LG}
}
Apache License 2.0 - See LICENSE for details.