Yu-Ju Tsai,
Brian Price,
Qing Liu,
Luis Figueroa,
Daniil Pakhomov,
Zhihong Ding,
Scott Cohen,
Ming-Hsuan Yang
University of California, Merced - Adobe Research
- [2025-06-25] Our paper is accepted by ICCV 2025.
Abstract: Recent human image completion methods can generate accurate human body shapes. However, previous methods lack sufficient information without reference images when unique parts, such as specific clothing items or accessories, are present. Even using state-of-the-art reference-based inpainting models to solve the problem, they still cannot effectively preserve fine detail information from the reference image. To address this issue, we propose the CompleteMe model to accurately reconstruct missing body parts using information from a reference image. By leveraging diffusion prior and using a dual U-Net architecture, our method can extract fine detail information from reference images and assist the completion process. This significantly improves the quality of completed images compared to existing approaches. We also propose a challenging reference-based human image completion benchmark to systematically evaluate the model's ability. Experimental results demonstrate that our method achieves superior visual fidelity and semantic consistency performance.
Clone our repo, and install packages in requirements.txt. We test our model on a 80G A100 GPU with 11.8 CUDA and 2.0.1 PyTorch. But inference on smaller GPUs is possible.
conda create --name completeme python=3.9 -y
conda activate completeme
pip install -r requirements.txt
pip install accelerate==0.26.0
pip install huggingface-hub==0.25.2
# initialize the accelerate environment
accelerate config default
# if you want to train from scratch, download pretrained models like SD, SD-inpainting, etc
python download_pretrained.pyDownload our checkpoint from HuggingFace
mkdir checkpoints
# Put the zip file under checkpoints folder
unzip ./checkpoints/completeme_pipeline.zip -d ./checkpoints/We provide a script for Gradio Demo. You can use the following command to launch a demo:
python run_gradio3_demo.pyCheck inference_single.sh. Modify the checkpoint path and input as you need, and run command:
bash inference_single.shThis inference script will generate three results:
# direct model output
output.jpg
# blend with input image
blend.jpg
# all input and output
visualize.jpgHere is an example input in the inference_arg.py:
### Define input data ###
height, width = 512, 512
prompt = "a person, high quality, realistic." # input prompt
input_dict = {
"image": image_path,
"mask_image": mask_path,
"appearance": {
"whole body clothes": reference_path,
},
"mask_dict": {
"whole body clothes": reference_mask_path,
},
}⭐️⭐️⭐️ Notably, the input_dict should contain keys appearance and mask_dict. These two mean specifying the appearance of parts of multiple reference images.
⭐️⭐️⭐️ The keys in these three parts also have explanations. Keys in appearance and mask_dict should be the same. The choices include "upper body clothes", "lower body clothes", "whole body clothes", "hair or headwear", "face", "shoes".
-
We establish the benchmark to meet the following criteria: 1) the same person in the same clothing, 2) a significantly different pose, 3) unique patterns like special clothing, accessories, or tattoos, and 4) different background conditions. We obtain 417 image groups, each consisting of a source image, inpainting area, and reference image.
-
We also provide the benchmark results with other reference-based inpainting method on HuggingFace. These methods include: Paint-by-Example, AnyDoor, LeftRefill, MimicBrush and our CompleteMe.
Please download our benchmark from HuggingFace, and extract the benchmark zip files to benchmark folder.
Please set up the related path as follows:
# Benchmark settings
# for input image
IMAGE_PATH="./benchmark/input_all"
# for inpainting mask
MASK_PATH="./benchmark/inpainting_mask_all"
# for reference
REFERENCE_PATH="./benchmark/reference_all"
# for reference mask
REF_MASK_PATH="./benchmark/reference_mask_all"
# for prompt
PROMPT_PATH="./benchmark/all_prompt_pure"We provide the script to inference the folder:
bash inference_folder.shThe dataset has been released on HuggingFace. We provide the download and unzip script in download_dataset.py, please use the following command:
python download_dataset.pyIt will prepare the dataset in the folder data/DeepFashion-MultiModal-Parts2Whole, so that you can run our config to train the model or run our dataset file completeme/data/ref_trg.py to check our dataset.
We will also convert the densepose to the inpainting mask for training. please use the following command:
python convert_pose_to_mask.pyIt will generate the human like masks in the folder data/DeepFashion-MultiModal-Parts2Whole/binary_masks.
Make sure that the dataset is organized as follows:
DeepFashion-MultiModal-Parts2Whole
# Inpainting masks
|-- binary_masks
# Structure signals
|-- densepose
|-- openpose
# Appearance conditions
|-- face
|-- hair_headwear
|-- lower_body_clothes
|-- upper_body_clothes
|-- whole_body_clothes
|-- shoes
# Target images
|-- images
# Caption file
|-- train.jsonl
`-- test.jsonlHere is one example inside train.jsonl:
{
# input
"target_id": "MEN-Jackets_Vests-id_00000084-04_1_front",
# reference
"reference_id": "MEN-Jackets_Vests-id_00000084-04_4_full",
# input image
"target": "images/MEN-Jackets_Vests-id_00000084-04_1_front.jpg",
"caption": "The gentleman wears a long-sleeve shirt with solid color patterns. The shirt is with cotton fabric and its neckline is crew. This person also wears an outer clothing, with cotton fabric and color block patterns.",
# reference images
"appearance": {
"upper body clothes": "upper_body_clothes/MEN-Jackets_Vests-id_00000084-04_4_full_rgb.jpg",
"lower body clothes": "lower_body_clothes/MEN-Jackets_Vests-id_00000084-04_4_full_rgb.jpg",
"whole body clothes": "whole_body_clothes/MEN-Jackets_Vests-id_00000084-04_4_full_rgb.jpg",
"hair or headwear": "hair_headwear/MEN-Jackets_Vests-id_00000084-04_4_full_rgb.jpg",
"face": "face/MEN-Jackets_Vests-id_00000084-04_4_full_rgb.jpg",
"shoes": "shoes/MEN-Jackets_Vests-id_00000084-04_4_full_rgb.jpg"},
# reference masks
"mask": {
"upper body clothes": "upper_body_clothes/MEN-Jackets_Vests-id_00000084-04_4_full_mask.jpg",
"lower body clothes": "lower_body_clothes/MEN-Jackets_Vests-id_00000084-04_4_full_mask.jpg",
"whole body clothes": "whole_body_clothes/MEN-Jackets_Vests-id_00000084-04_4_full_mask.jpg",
"hair or headwear": "hair_headwear/MEN-Jackets_Vests-id_00000084-04_4_full_mask.jpg",
"face": "face/MEN-Jackets_Vests-id_00000084-04_4_full_mask.jpg",
"shoes": "shoes/MEN-Jackets_Vests-id_00000084-04_4_full_mask.jpg"},
# You can ignore this part.
"structure": {
"densepose": "densepose/MEN-Jackets_Vests-id_00000084-04_1_front_densepose.png",
"openpose": "openpose/MEN-Jackets_Vests-id_00000084-04_1_front.png"}}This human image dataset comprising about 41,500 reference-target pairs. Each pair in this dataset includes multiple reference images, including pose maps, various aspects of human appearance (e.g., hair, face, clothes, shoes), and a target image featuring the same individual (ID), along with textual captions. Details about the dataset refer to Dataset repo.
Our dataset is post-processed from DeepFashion-Multimodal dataset.
If training our completeme in a single device, use the following command:
python train.py --config configs/train-completeme-sd15.yamlIf training on a DDP environment (assume 8 devices here), run the command:
accelerate launch \
--mixed_precision=fp16 \
--num_processes=8 \
--num_machines=1 \
--multi_gpu \
train.py --config configs/train-completeme-sd15.yamlor run:
bash train_completeme.shIn our config file, the batch size per device is set to 8 (which is recommended for a device of 80G memory). If you train on a device of smaller memory, you need to reduce it.
We appreciate the open source of the following projects:
diffusers magic-animate Moore-AnimateAnyone DeepFashion-MultiModal Parts2Whole MimicBrush
If you find our work useful for your research, please consider citing our paper:
@inproceedings{tsai2025completeme,
title={CompleteMe: Reference-based Human Image Completion},
author={Tsai, Yu-Ju and Price, Brian and Liu, Qing and Figueroa, Luis and Pakhomov, Daniil and Ding, Zhihong and Cohen, Scott and Yang, Ming-Hsuan},
booktitle={ICCV},
year={2025}
}🌟 If you find this project helpful, please give it a star! 🌟



