Armando Fortes Tianyi Wei Shangchen Zhou Xingang Pan
S-lab, Nanyang Technological University
Bokeh Diffusion enables precise, scene-consistent bokeh transitions in text-to-image diffusion models
🎥 For more visual results, check out our project page
🚀✨🚧 We are working hard on releasing the code... 🔧🛠️💻 Stay tuned! 🚧✨🚀
- [2025.03] This repo is created.
- Release Dataset
- Release Model Weights
- Release Inference Code
- Release Training Code
- Release FLUX Version
- Release HuggingFace Demo
Bokeh Diffusion combines three key components to produce lens-like bokeh without altering scene structure:
(1) Hybrid Dataset Pipeline: We merge real in-the-wild images (for realistic bokeh and diversity) with synthetic blur augmentations (for constrastive pairs). This approach anchors defocus realism while ensuring robust examples for training.
(2) Defocus Blur Conditioning: We inject a physically interpretable blur parameter via decoupled cross-attention.
(3) Grounded Self-Attention: We designate a “pivot” image to anchor scene layout, ensuring consistent object placement across different blur levels. This prevents unintended content shifts when adjusting defocus.
If you find our work useful, please cite the following paper:
@article{fortes2025bokeh,
title = {Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models},
author = {Fortes, Armando and Wei, Tianyi and Zhou, Shangchen and Pan, Xingang},
journal = {arXiv preprint arXiv:2503.08434},
year = {2025},
}
This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
We would like to thank the following projects that made this work possible:
- Megalith-10M is used as the base dataset for collecting real in-the-wild photographs.
- BokehMe provides the synthetic blur rendering engine for generating defocus augmentations.
- Depth-Pro is used to estimate metric depth maps.
- RMBG v2.0 is used to generate foreground masks.
- Realistic-Vision & Cyber-Realistic are used as the base models for generating the samples in the paper.