Skip to content

LivingFutureLab/UnifiedThinker

Repository files navigation

Unified Thinker: A General Reasoning Modular Core for Image Generation

Sashuai Zhou1,2*, Qiang Zhou2*, Jijin Hu2*, Hanqing Yang2*, Yue Cao3, Junpeng Ma4,
Yinchao Ma2, Jun Song2†, Tiezheng Ge2, Cheng Yu2, Bo Zheng2, Zhou Zhao1†

1Zhejiang University    2Alibaba Group    3Nanjing University    4Fudan University
* Equal contribution   Corresponding authors

Project Page arXiv Data Models Coming Soon

Unified Thinker is a task-agnostic reasoning core for general image generation. It decouples a trainable Thinker (MLLM) from an image Generator (e.g., diffusion models), enabling executable planning that bridges the persistent reasoning–execution gap in reasoning-driven image generation and editing.

pipeline

📢 News

  • 🎉 Paper & Code & HieraReason-40K is now available!
  • 🏆 Unified Thinker is accepted by ACL 2026!
  • Checkpoint is now available!🚀

Highlights

  • Decoupled Thinker–Generator design: upgrade reasoning without retraining the entire generator.
  • Unified planning format across T2I (creation) and I2I (edit-only modification).
  • HieraReason-40K: hierarchical reasoning traces + executable enhanced prompts for cold start.
  • Dual-phase RL with generator-in-the-loop to align plans with actual visual outcomes.
  • Cross-generator transfer: Thinker can be plugged into different diffusion backbones.

🎬 Demo Video

Unified Thinker Demo

🛠 Preparation

Data & Model Setup

  1. Dataset Structure: Create local directories and symlink or download the datasets as follows:

    • UniREdit-Data-100K: data/UniREdit-Data-100K/uniredit-data/original_images/
    • Banana-400K: data/Banana-400K/source_images/
    • HieraReason-40K: Download und.jsonl and gen.jsonl to data/.
  2. Pre-trained Weights: Download and organize the models in the model/ directory:

    • model/Qwen-Image-Edit-2509 (The Image Editor)
    • model/Qwen2.5-VL-7B-Instruct (The Reasoning Core)

Setup

pip install -U pip
pip install -r requirements.txt

Training

bash scripts/thinker_editor/train.sh

Inference

  • Single Image Inference (CLI):
bash /root/UnifiedThinker/inference/infer_single.sh
  • Interactive Demo (Gradio): If you prefer a web interface for a more intuitive experience, run:
bash /root/UnifiedThinker/inference/infer_gradio.sh

Project Status

This repository currently serves as the project homepage.

  • Training & inference code
  • Model checkpoints (Thinker / Generator adapters)
  • HieraReason-40K data & processing scripts
  • Reproduction scripts for benchmarks

Citation

📖 If you find this work useful, please cite:

@misc{zhou2026unifiedthinker,
      title={Unified Thinker: A General Reasoning Modular Core for Image Generation}, 
      author={Sashuai Zhou and Qiang Zhou and Jijin Hu and Hanqing Yang and Yue Cao and Junpeng Ma and Yinchao Ma and Jun Song and Tiezheng Ge and Cheng Yu and Bo Zheng and Zhou Zhao},
      year={2026},
      eprint={2601.03127},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.03127}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages