Sashuai Zhou1,2*, Qiang Zhou2*, Jijin Hu2*, Hanqing Yang2*, Yue Cao3, Junpeng Ma4,
Yinchao Ma2, Jun Song2†, Tiezheng Ge2, Cheng Yu2, Bo Zheng2, Zhou Zhao1†
1Zhejiang University 2Alibaba Group 3Nanjing University 4Fudan University
* Equal contribution † Corresponding authors
Unified Thinker is a task-agnostic reasoning core for general image generation. It decouples a trainable Thinker (MLLM) from an image Generator (e.g., diffusion models), enabling executable planning that bridges the persistent reasoning–execution gap in reasoning-driven image generation and editing.
- 🎉 Paper & Code & HieraReason-40K is now available!
- 🏆 Unified Thinker is accepted by ACL 2026!
- ⏳ Checkpoint is now available!🚀
- Decoupled Thinker–Generator design: upgrade reasoning without retraining the entire generator.
- Unified planning format across T2I (creation) and I2I (edit-only modification).
- HieraReason-40K: hierarchical reasoning traces + executable enhanced prompts for cold start.
- Dual-phase RL with generator-in-the-loop to align plans with actual visual outcomes.
- Cross-generator transfer: Thinker can be plugged into different diffusion backbones.
-
Dataset Structure: Create local directories and symlink or download the datasets as follows:
- UniREdit-Data-100K:
data/UniREdit-Data-100K/uniredit-data/original_images/ - Banana-400K:
data/Banana-400K/source_images/ - HieraReason-40K: Download
und.jsonlandgen.jsonltodata/.
- UniREdit-Data-100K:
-
Pre-trained Weights: Download and organize the models in the
model/directory:model/Qwen-Image-Edit-2509(The Image Editor)model/Qwen2.5-VL-7B-Instruct(The Reasoning Core)
pip install -U pip
pip install -r requirements.txtbash scripts/thinker_editor/train.sh- Single Image Inference (CLI):
bash /root/UnifiedThinker/inference/infer_single.sh- Interactive Demo (Gradio): If you prefer a web interface for a more intuitive experience, run:
bash /root/UnifiedThinker/inference/infer_gradio.shThis repository currently serves as the project homepage.
- Training & inference code
- Model checkpoints (Thinker / Generator adapters)
- HieraReason-40K data & processing scripts
- Reproduction scripts for benchmarks
📖 If you find this work useful, please cite:
@misc{zhou2026unifiedthinker,
title={Unified Thinker: A General Reasoning Modular Core for Image Generation},
author={Sashuai Zhou and Qiang Zhou and Jijin Hu and Hanqing Yang and Yue Cao and Junpeng Ma and Yinchao Ma and Jun Song and Tiezheng Ge and Cheng Yu and Bo Zheng and Zhou Zhao},
year={2026},
eprint={2601.03127},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.03127},
}
