In this work, we introduce a unified bilevel optimization framework for hyperparameter optimization in diffusion models. By formulating diffusion sampling as the lower-level problem and treating hyperparameters (e.g., entropy strength or noise schedule) as upper-level variables, our approach enables principled, gradient-based optimization of hyperparameters during both pre-training and fine-tuning. Specifically:
-
Reward Fine-Tuning (Section 3.1): We propose a bilevel method to tune entropy regularization using CLIP scores, aiming to prevent reward (aesthetic scores) over-optimization during fine-tuning.
-
Noise Scheduling in Pre-training (Section 3.2): We propose a bilevel method to automatically learn the optimal noise schedule of diffusion model to enhance the generated image quality.
This repository provides modular implementations for both tasks and includes the code to reproduce all experiments from our ICML 2025 paper:
A First-order Generative Bilevel Optimization Framework for Diffusion Models
Quan Xiao, Hui Yuan, A. F. M. Saif, Gaowen Liu, Ramana Kompella, Mengdi Wang, Tianyi Chen
├── noise_schedule/ # Section 3.2: Noise scheduling experiments on MNIST
│ ├── train_mnist.py/
│ ├── train_mnist_bayesian.py/
│ └── train_mnist_bilevel_ZO.py/
├── entropy_fine-tuning/ # Section 3.1: Reward fine-tuning modules
│ ├── main.py # Training for bilevel reward fine-tuning
│ ├── reward_config.yaml # Default hyperparameters for Section 3.1
│ ├── scorer_bi.py # Reward functions
└── gradguided_sdpipeline.py/ # Checkpoints and generated samples
├── requirements.txt # Python dependencies
└── README.md # This overview
Ensure you have Python 3.8+ and install the required packages:
pip install -r requirements.txtDependencies include:
- PyTorch (>=1.10)
- diffusers
- transformers
- torchvision
- tqdm
- pyyaml
- CLIP (for reward models)
Reproduce the bilevel reward fine-tuning from Section 3.1:
cd entropy_fine-tuning
python scripts/reward_finetune.py \
--config configs/reward_config.yaml \
--pretrained_model_path "CompVis/stable-diffusion-v1-5" \
--lower_reward_model models/resnet18_reward.pth \
--upper_reward clip \
--num_samples 256 \
--lambda_init 0.01 \
--gamma 1e3 \
--steps 500 \
--output_dir outputs/reward_finetuneOutputs are stored under saif/outputs/reward_finetune:
lambda_final.txt: learned entropy strength λsamples/: generated images under the fine-tuned policylogs/: training curves and gradient trajectories
Run the bilevel noise schedule optimization for Section 3.2 on the MNIST dataset:
cd noise_schedule
python train_mnist_bilevel_ZO.py \
--inner_loop 10 \
--inner_loop_z 1 \
--gamma 1 \
--gamma_end 1 \
--lr_beta 0.05 0.05 1 0 \- optimized noise schedule parameters
- sampled trajectories and denoised images
- training logs for upper- and lower-level objectives
Both modules implement our generative bilevel framework:
- Lower-level: SDE-based sampling adjusted by a surrogate loss (entropy regularized reward function or score matching loss).
- Upper-level: First-order update of the regularization strength (λ) or schedule parameters via zeroth-order gradient estimates.
Refer to Algorithms 2 & 6 in the paper for detailed pseudo‑code and mathematical derivations.
If you find our work interesting, please consider citing this paper:
@inproceedings{xiao2025first,
title={A First-order Generative Bilevel Optimization Framework for Diffusion Models},
author={Xiao, Quan and Yuan, Hui and Saif, AFM and Liu, Gaowen and Kompella, Ramana and Wang, Mengdi and Chen, Tianyi},
booktitle={Proceedings of the 42nd International Conference on Machine Learning},
year={2025}
}

