This is the official model implementation and benchmark evaluation repository of AnyEdit: Unified High-Quality Image Edit with Any Idea
- Clone this repo
git clone https://github.com/weichow23/AnySD
- Environment setup
conda create -n anyedit python=3.9.2
conda activate anyedit
pip install -r requirements.txt
pip install --upgrade torch diffusers xformers triton pydantic deepspeed
pip install git+https://github.com/openai/CLIP torchmetrics==0.5
- For AnyBench you need to
bash anybench/setup.sh # You need to go into the script and carefully check to ensure that the correct dependencies are installed.
This is the guide for the evaluation tool for AnyBench. The specific files are located in the anybench
directory.
We have integrated the evaluations for AnyBench
, Emu-edit
, and MagicBrush
into the same codebase, and it supports the following models: Null-Text
, Uni-ControlNet
, InstructPix2Pix
, MagicBrush
, HIVE
, and UltraEdit (SD3)
.
Evaluation metrics are CLIPim↑
, CLIPout↑
, L1↓
, L2↓
and DINO↑
EMU-Edit
CUDA_VISIBLE_DEVICES=7 PYTHONPATH='./' python3 anybench/eval/emu_gen_eval.py
It is worth noting that the emu-edit test actually uses the validation set from the Hugging Face repository facebook/emu_edit_test_set_generations. This point has been discussed in previous work here.
MagicBrush
download the test set from MagicBrush and unzip it in anybench/dataset/magicbrush
CUDA_VISIBLE_DEVICES=7 PYTHONPATH='./' python3 anybench/eval/magicbrush_gen_eval.py
AnyBench
- download the AnyBench-test
cd anybench/dataset/
gdown 1V-Z4agWoTMzAYkRJQ1BNz0-i79eAVWt4
sudo apt install unzip
unzip AnyEdit-Test.zip
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='./' python3 anybench/eval/anybench_gen_eval.py
⚠ Notice: AnySD may output completely black images for certain sensitive commands, which is a normal occurrence.
⚠ Notice: During evaluation, the final scores may vary due to the influence of inference hyperparameters, random seeds, and batch size.
We sorted out the AnyEdit data when we released it to the public. To adapt the sorted model, we retrained the model, so the results will be slightly different from those in the paper, but the general results are similar. And the hyperparameters also have a greater impact on the results.
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='./' python3 anysd/infer.py
Prepare Data
huggingface-cli download Bin1117/anyedit-split --repo-type dataset
- Stage I
bash train_stage1.sh
- Stage II
# before training, you should download anybench-test as it is the validation set
cd anybench/dataset/
gdown 1w_QsjDvNp-c9R1gaT5lex0esQAPRE1AQ
sudo apt install unzip
unzip AnyEdit-Test.zip
The experts included in AnySD are as follows
# TYPE = ['visual_ref', 'visual_ske', 'visual_scr', 'visual_bbox', 'visual_mat', 'visual_seg', 'visual_dep', 'viewpoint', 'global']
bash train_stage2.sh
Since AnyEdit contains a wide range of editing instructions across various domains, it holds promising potential for developing a powerful editing model to address high-quality editing tasks. However, training such a model has three extra challenges: (a) aligning the semantics of various multi-modal inputs; (b) identifying the semantic edits within each domain to control the granularity and scope of the edits; (c) coordinating the complexity of various editing tasks to prevent catastrophic forgetting. To this end, we propose a novel AnyEdit Stable Diffusion approach (🎨AnySD) to cope with various editing tasks in the real world.
Architecture of 🎨AnySD. 🎨AnySD is a novel architecture that supports three conditions (original image, editing instruction, visual prompt) for various editing tasks.
💖 Our model is based on the awesome SD 1.5
@article{yu2024anyedit,
title={AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea},
author={Yu, Qifan and Chow, Wei and Yue, Zhongqi and Pan, Kaihang and Wu, Yang and Wan, Xiaoyang and Li, Juncheng and Tang, Siliang and Zhang, Hanwang and Zhuang, Yueting},
journal={arXiv preprint arXiv:2411.15738},
year={2024}
}