Skip to content

The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"

License

Notifications You must be signed in to change notification settings

atfortes/BokehDiffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

Armando FortesTianyi WeiShangchen ZhouXingang Pan

S-lab, Nanyang Technological University

Bokeh Diffusion enables precise, scene-consistent bokeh transitions in text-to-image diffusion models

teaser

🎥 For more visual results, check out our project page

🚀✨🚧 We are working hard on releasing the code... 🔧🛠️💻 Stay tuned! 🚧✨🚀

📮 Update

  • [2025.03] This repo is created.

🚧 TODO

  • Release Dataset
  • Release Model Weights
  • Release Inference Code
  • Release Training Code
  • Release FLUX Version
  • Release HuggingFace Demo

🔎 Overview

Bokeh Diffusion combines three key components to produce lens-like bokeh without altering scene structure:

(1) Hybrid Dataset Pipeline: We merge real in-the-wild images (for realistic bokeh and diversity) with synthetic blur augmentations (for constrastive pairs). This approach anchors defocus realism while ensuring robust examples for training.

dataset

(2) Defocus Blur Conditioning: We inject a physically interpretable blur parameter via decoupled cross-attention.

(3) Grounded Self-Attention: We designate a “pivot” image to anchor scene layout, ensuring consistent object placement across different blur levels. This prevents unintended content shifts when adjusting defocus.

method

📑 Citation

If you find our work useful, please cite the following paper:

@article{fortes2025bokeh,
    title     = {Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models},
    author    = {Fortes, Armando and Wei, Tianyi and Zhou, Shangchen and Pan, Xingang},
    journal   = {arXiv preprint arXiv:2503.08434},
    year      = {2025},
}

©️ License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

🤝 Acknowledgements

We would like to thank the following projects that made this work possible:

  • Megalith-10M is used as the base dataset for collecting real in-the-wild photographs.
  • BokehMe provides the synthetic blur rendering engine for generating defocus augmentations.
  • Depth-Pro is used to estimate metric depth maps.
  • RMBG v2.0 is used to generate foreground masks.
  • Realistic-Vision & Cyber-Realistic are used as the base models for generating the samples in the paper.

About

The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"

Topics

Resources

License

Stars

Watchers

Forks