🖼️ ImagenWorld

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

ImagenWorld is a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios.

Broad coverage across 6 domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots.
Rich supervision: ~3.6K condition sets and ~20K fine-grained human annotations enable comprehensive, reproducible evaluation.
Explainable evaluation pipeline: We decompose generated outputs via object/segment extraction to identify entities (objects, fine-grained regions), supporting both scalar ratings and object-/segment-level failure tags.
Diverse model suite: We evaluate 14 models in total — 4 unified (GPT-Image-1, Gemini 2.0 Flash, BAGEL, OmniGen2) and 10 task-specific baselines (SDXL, Flux.1-Krea-dev, Flux.1-Kontext-dev, Qwen-Image, Infinity, Janus Pro, UNO, Step1X-Edit, IC-Edit, InstructPix2Pix).

[🌐 Project Page] [📄 Preprint] [💾 Datasets] [🏛️ ImagenWorld-Visualizer]

📰 News

2025 Oct 16: ComfyUI Blog on https://blog.comfy.org/p/introducing-imagenworld
2025 Oct 13: Preprint released on Github.

📖 Introduction

This repository contains the code for the paper ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks. In this paper, We introduce ImagenWorld, a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios. Unlike prior evaluations that focus on isolated tasks or narrow domains, ImagenWorld is organized into six domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots, and six tasks: Text-to-Image Generation (TIG), Single-Reference Image Generation (SRIG), Multi-Reference Image Generation (MRIG), Text-to-Image Editing (TIE), Single-Reference Image Editing (SRIE), and Multi-Reference Image Editing (MRIE). The benchmark includes 3.6K condition sets and 20K fine-grained human annotations, providing a comprehensive testbed for generative models. To support explainable evaluation, ImagenWorld applies object- and segment-level extraction to generated outputs, identifying entities such as objects and fine-grained regions. This structured decomposition enables human annotators to provide not only scalar ratings but also detailed tags of object-level and segment-level failures.

🚀 Quick Start — Inference

Tasks: TIG (Text→Image Generation), TIE (Text→Image Editing), SRIG, SRIE, MRIG, MRIE
Datasets: assumes ImagenWorld/<TASK>/... layout (adjust --task_path as needed)

Open-Source Models

Directory: inference/open-source/
Entrypoint: main.py
Model registry: inference/open-source/config.py
Batch helper: open_models.sh

All open-source and close-source runners follow a unified CLI:

python main.py   --task <TASK>   --model <MODEL>   --task_path <DATASET_PATH>   --limit <N> --verbose

🔹 Example: TIG (Text→Image Generation) with UNO

cd inference/open-source

python main.py   --task TIG   --model UNO   --task_path /path/to/ImagenWorld/TIG   --limit 5 --verbose

Explanation

Loads the UNO open-source generator from the registry (config.py)
Runs the TIG (Text→Image Generation) task using samples from /path/to/ImagenWorld/TIG
Saves results to model_outputs/model_name.png
Prints per-sample logs if --verbose is enabled

Closed-Source Models

Directory: inference/closed-source/
Entrypoint: main.py
Model registry: inference/closed-source/config.py
Batch helper: closed_models.sh

Available closed-source APIs and outputs:

GPT-Image-1 → saves gpt-image-1.png
Gemini2Flash → saves gemini.png

🔧 Setup Environment

Set your API keys before running:

export OPENAI_API_KEY="sk-..."     # for GPT-Image-1
export GEMINI_API_KEY="..."        # for Gemini 2.5 Flash Image Preview

🔹 Example: TIE (Text→Image Editing) with Gemini 2.5 Flash

cd inference/closed-source

python main.py   --task TIE   --model Gemini2Flash   --task_path /path/to/ImagenWorld/TIE   --limit 5 --verbose

Explanation

Loads the selected closed-source API model (via OpenAI or Gemini)
Runs the specified task on samples from /path/to/ImagenWorld/<TASK>
Stores generated images (e.g., gpt-image-1.png, gemini.png)

Batch Execution (Optional)

Each inference type includes a shell helper for multi-task runs:

# open-source batch
cd inference/open-source
bash open_models.sh

# closed-source batch
cd inference/closed-source
bash closed_models.sh

In both scripts:

Set BASE_PATH → dataset root (e.g., /path/to/ImagenWorld)
Define TASK_MODELS to map each task to a model
Set API keys for closed-source models

Citation

If you find our work useful for your research, please consider citing our paper:

@misc{imagenworld2025,
  title        = {ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks},
  author       = {Samin Mahdizadeh Sani and Max Ku and Nima Jamali and Matina Mahdizadeh Sani and Paria Khoshtab and Wei-Chieh Sun and Parnian Fazel and Zhi Rui Tam and Thomas Chong and Edisy Kin Wai Chan and Donald Wai Tong Tsang and Chiao-Wei Hsu and Ting Wai Lam and Ho Yin Sam Ng and Chiafeng Chu and Chak-Wing Mak and Keming Wu and Hiu Tung Wong and Yik Chun Ho and Chi Ruan and Zhuofeng Li and I-Sheng Fang and Shih-Ying Yeh and Ho Kei Cheng and Ping Nie and Wenhu Chen},
  year         = {2025},
  doi          = {10.5281/zenodo.17344183},
  url          = {https://zenodo.org/records/17344183},
  projectpage  = {https://tiger-ai-lab.github.io/ImagenWorld/},
  blogpost     = {https://blog.comfy.org/p/introducing-imagenworld},
  note         = {Community-driven dataset and benchmark release, Temporarily archived on Zenodo while arXiv submission is under moderation review.},
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
eval		eval
inference		inference
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
add_som.py		add_som.py
extract_objects.py		extract_objects.py
gemini_preprocess.py		gemini_preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🖼️ ImagenWorld

📰 News

📖 Introduction

🚀 Quick Start — Inference

Open-Source Models

🔹 Example: TIG (Text→Image Generation) with UNO

Closed-Source Models

🔧 Setup Environment

🔹 Example: TIE (Text→Image Editing) with Gemini 2.5 Flash

Batch Execution (Optional)

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TIGER-AI-Lab/ImagenWorld

Folders and files

Latest commit

History

Repository files navigation

🖼️ ImagenWorld

📰 News

📖 Introduction

🚀 Quick Start — Inference

Open-Source Models

🔹 Example: TIG (Text→Image Generation) with UNO

Closed-Source Models

🔧 Setup Environment

🔹 Example: TIE (Text→Image Editing) with Gemini 2.5 Flash

Batch Execution (Optional)

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages