Skip to content

LINs-lab/UCGM

Repository files navigation

UCGM: Unified Continuous Generative Models

Peng Sun1,2·   Yi Jiang2·Tao Lin1    

1Westlake University   2Zhejiang University 

🤖 Models📄 Paper🏷️ BibTeX

PWC PWC

Official PyTorch implementation of UCGM: A unified framework for training, sampling, and understanding continuous generative models (diffusion, flow-matching, consistency models).

🔥 News

📆 [2025/5/23] 🚀 Ultra-Fast Model Tuning Achieved!

  • ✨ UCGM introduces lightning-fast model adaptation—transforming any pre-trained diffusion model (e.g., REPA-E with FID=1.54 at NFE=80) into a high-performance, few-step generator with record efficiency:
    Steps=NFE=2, FID=1.39 (ImageNet-1K 256×256)
    Tuned in only 8 minutes on just 8 GPUs

  • 📊 Extended results for additional tuned diffusion models are available here.

  • 🔜 Stay tuned! The tuning code will be released soon.

🏆 Key Results

Generated samples from two 675M diffusion transformers trained with UCGM on ImageNet-1K 512×512.
Left: A multi-step model (Steps=40, FID=1.48) | Right: A few-step model (Steps=2, FID=1.75)
Samples generated without classifier-free guidance or other guidance techniques.

✨ Features

  • 🚀 Unified Framework: Train/sample diffusion, flow-matching, and consistency models in one system
  • 🔌 Plug-and-Play Acceleration: UCGM-S boosts pre-trained models—e.g., given a model from REPA-E (on ImageNet 256×256), cuts 84% sampling steps (NFE=500 → NFE=80) while improving FID (1.26 → 1.06)
  • 🥇 SOTA Performance: UCGM-T-trained models outperform peers at low steps (1.21 FID @ 30 steps on ImageNet 256×256, 1.48 FID @ 40 steps on 512×512)
  • Few-Step Mastery: Just 2 steps? Still strong (1.42 FID on 256×256, 1.75 FID on 512×512)
  • 🚫 Guidance-Free: No classifier-free guidance for UCGM-T-trained models, simpler and faster
  • 🏗️ Architecture & Dataset Flexibility: Compatible with diverse datasets (ImageNet, CIFAR, etc.) and VAEs/neural architectures (CNNs, Transformers)
  • 📖 Check more features in our paper!

🔧 Preparation

  1. Download necessary files from Huggingface, including:

    • Checkpoints of various VAEs
    • Statistic files for datasets
    • Reference files for FID calculation
  2. Place the downloaded outputs and buffers folders at the same directory level as this README.md

  3. For dataset preparation (skip if not training models), run:

bash scripts/data/in1k256.sh

⏩ UCGM-S: Plug-and-Play Acceleration

Accelerate any continuous generative model (diffusion, flow-matching, etc.) with UCGM-S. Results marked with ⚡ denote UCGM-S acceleration.
NFE = Number of Function Evaluations (sampling computation cost)

Method Model Size Dataset Resolution NFE FID NFE (⚡) FID (⚡) Model
REPA-E 675M ImageNet 256×256 250×2 1.26 40×2 1.06 Link
Lightning-DiT 675M ImageNet 256×256 250×2 1.35 50×2 1.21 Link
DDT 675M ImageNet 256×256 250×2 1.26 50×2 1.27 Link
EDM2-S 280M ImageNet 512×512 63 2.56 40 2.53 Link
EDM2-L 778M ImageNet 512×512 63 2.06 50 2.04 Link
EDM2-XXL 1.5B ImageNet 512×512 63 1.91 40 1.88 Link
DDT 675M ImageNet 512×512 250×2 1.28 150×2 1.24 Link

💻 Usage Examples: Generate images and evaluate FID using a REPA-E trained model:

bash scripts/run_eval.sh ./configs/sampling_multi_steps/in1k256_sit_xl_repae_linear.yaml

⚡ UCGM-T: Ultra-Efficient Tuning System

UCGM-T revolutionizes multi-step generative models (including diffusion and flow matching models) by enabling ultra-efficient conversion to high-performance few-step versions. Results marked with ⚡ indicate UCGM-T-tuned models.

Pre-trained Model Model Size Dataset Resolution Tuning Efficiency NFE (⚡) FID (⚡) Tuned Model
Lightning-DiT 675M ImageNet 256×256 0.64 epoch (10 mins) 2 2.06 Link
REPA 675M ImageNet 256×256 0.64 epoch (13 mins) 2 1.95 Link
REPA-E 675M ImageNet 256×256 0.40 epoch (8 mins) 2 1.39 Link
DDT 675M ImageNet 256×256 0.32 epoch (11 mins) 2 1.90 Link

(Please note that the tuning time mentioned above is based on evaluation using 8 H800 GPUs)

💻 Usage Examples

Generate Images:

# Generate samples using our tuned few-step model
bash scripts/run_eval.sh ./configs/tuning_few_steps/in1k256_sit_xl_repae_linear.yaml

⚙️ UCGM-T: Unified Training Framework

Train multi-step and few-step models (diffusion, flow-matching, consistency) with UCGM-T. All models sample efficiently without guidance.

Encoders Model Size Resolution Dataset NFE FID Model
VA-VAE 675M 256×256 ImageNet 30 1.21 Link
VA-VAE 675M 256×256 ImageNet 2 1.42 Link
DC-AE 675M 512×512 ImageNet 40 1.48 Link
DC-AE 675M 512×512 ImageNet 2 1.75 Link

💻 Usage Examples

Generate Images:

# Generate samples using our pretrained few-step model
bash scripts/run_eval.sh ./configs/training_few_steps/in1k256_tit_xl_vavae.yaml

Train Models:

# Train a new multi-step model (full training)
bash scripts/run_train.sh ./configs/training_multi_steps/in1k512_tit_xl_dcae.yaml

# Convert to few-step model (requires pretrained multi-step checkpoint)
bash scripts/run_train.sh ./configs/training_few_steps/in1k512_tit_xl_dcae.yaml

Note for few-step training:

  1. Requires initialization from a multi-step checkpoint
  2. Prepare your checkpoint file with both model and ema keys:
    {
      "model": multi_step_ckpt["ema"], 
      "ema": multi_step_ckpt["ema"]
    }

🏷️ Bibliography

If you find this repository helpful for your project, please consider citing our work:

@article{sun2025unified,
  title = {Unified continuous generative models},
  author = {Sun, Peng and Jiang, Yi and Lin, Tao},
  journal = {arXiv preprint arXiv:2505.07447},
  year = {2025},
  url = {https://arxiv.org/abs/2505.07447},
  archiveprefix = {arXiv},
  eprint = {2505.07447},
  primaryclass = {cs.LG}
}

📄 License

Apache License 2.0 - See LICENSE for details.

About

[Preprint] UCGM: Unified Continuous Generative Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published