Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis
Generative models can now produce photorealistic imagery, yet they still struggle with the long, multi-goal prompts that professional designers issue. VisionDirector is a training-free, vision-language supervisor that:
- Extracts structured goals from long instructions
- Dynamically decides between one-shot generation and staged edits
- Runs micro-grid sampling with semantic verification/rollback after every edit
- Logs goal-level rewards for transparent evaluation
We further fine-tune the planner with Group Relative Policy Optimization (GRPO), yielding shorter edit trajectories (3.1 vs 4.2 steps) and stronger alignment.
To expose the gap between current models and real-world design requirements, we introduce LGBench, a 2,000-task benchmark:
| T2I | I2I | Total | |
|---|---|---|---|
| Tasks | 1,000 | 1,000 | 2,000 |
| Total Goals | 18,035 | 11,217 | 29,252 |
| Avg Goals/Task | 18.0 | 11.2 | — |
📦 Dataset: huggingface.co/datasets/TruemanV5/LGBench
VisionDirector achieves new state-of-the-art on:
- GenEval: +7% overall improvement
- ImgEdit: +0.07 absolute improvement
| Model | GenEval Overall |
|---|---|
| Qwen-Image | 0.87 |
| GPT Image 1 | 0.84 |
| VisionDirector | 0.94 |
VisionDirector/
├── bench/ # LGBench benchmark data
│ ├── t2i_1000.json # 1000 T2I tasks (18k goals)
│ ├── i2i_1000.json # 1000 I2I tasks (11k goals)
│ └── runners/ # Model inference scripts
│
├── evaluation/ # VLM-based goal verification
│ ├── goal_verify.py # Main evaluation script
│ ├── parallel_evaluate.py # Multi-GPU parallel evaluation
│ └── evaluate_sources.py # Source image quality check
│
├── agent/ # 🚧 Coming Soon
└── training/ # 🚧 Coming Soon
| Component | Status |
|---|---|
| LGBench Data | ✅ Available |
| Evaluation Scripts | ✅ Available |
| VisionDirector Agent | 🚧 Coming Soon |
| GRPO Fine-tuning | 🚧 Coming Soon |
@article{chu2025visiondirector,
title={VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis},
author={Chu, Meng and Yang, Senqiao and Che, Haoxuan and Zhang, Suiyun and Zhang, Xichen and Yu, Shaozuo and Gui, Haokun and Rao, Zhefan and Tu, Dandan and Liu, Rui and Jia, Jiaya},
journal={arXiv preprint arXiv:2512.19243},
year={2025}
}- Meng Chu (HKUST) — Project Lead
- Senqiao Yang (CUHK)
- Haoxuan Che (Huawei Research)
- Suiyun Zhang, Xichen Zhang, Shaozuo Yu, Haokun Gui, Zhefan Rao
- Dandan Tu, Rui Liu (Huawei Research)
- Jiaya Jia (HKUST)
This project is released under the Apache 2.0 License.

