[ICIP 2025] Official implementation of "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR"
This repository contains the official implementation of our ICIP 2025 paper "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR". We present a method for fine-tuning the video inpainting model specifically for video outpainting tasks, enabling seamless extension of video content beyond original frame boundaries.
- 2025.05.20: Paper accepted to ICIP 2025! 🎉
- 2025.07.06: Code and pretrained models released
This project is supported by CUDA 11.7, Python 3.7. Import the conda environment using below command.
conda env create -f e2fgvi.yaml
If you face error while running above code, install mmcv dependency via commands below.
conda activate e2fgvi
pip install mmcv==2.0.0rc4 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
pip install -U openmim
mim install mmcv-full
- Download pretrained E2FGVI(HQ) from E2FGVI
- Download fine-tuned outpainting model from our Google Drive
# Prepare your video and generate masks
python utils/generate_mask.py -v your_video_folder -k 4 --max_frames 512
# Run outpainting inference
python infer_example.py -v your_video_folder -m mask_1_4 -c release_model/in2out_e2fgvi.pth- Download Youtube-VOS from Official Link (Download
train_all_frames.zipandtest_all_frames.zip) - Unzip and merge JPEGImages directories under
youtube-vos/,
mv train_all_frames/JPEGImages/* /datas/youtube-vos/JPEGOriginal/
mv test_all_frames/JPEGImages/* /datas/youtube-vos/JPEGOriginal/
and download train.json and test.json from E2FGVI Github, resulting
|- datas
|- youtube-vos
train.json
test.json
|- JPEGOriginal
|- <video_id>
|- <frame_id>.jpg
|- <frame_id>.jpg
|- <video_id>
|- <frame_id>.jpg
|- <frame_id>.jpg
- Run
utils/zip_files.pyand remove original directory, resulting
|- datas
|- youtube-vos
|- JPEGImages
|- <video_id>.zip
|- <video_id>.zip
- Set the
data_rootattribute ofconfigs/hierarchical.jsonas the absolute path to your dataset root (/datasin above example)
python train.py
Our fine-tuning code log process using wandb by default. You can disable logging by --no_log flag.
python evaluate.py --dataset youtube-vos --data_root $DATA_ROOT$ --model e2fgvi_hq --ckpt $CKPT$ --result_path results_youtube --save_results
Evaluation log will saved under result_path. --save_results flag save all inferenced videos as png files. You may use utils/pngs_to_video.py to transform saved images to a video.
To outpaint your video(s), prepare your directory as follows.
|- <dataset_name>
|- video
|- <video1_name>.mp4
|- <video2_name>.mp4
Your video should be padded with desired outpainted region. For example, if you're trying to outpaint 4:3 video to 16:9, your video should be 16:9 with the padding already placed. The code supports evaluation by default, so ignore PSNR/SSIM if you are outpainting your padded video.
Run utils/generate_mask.py. k should be integer value of k=4. --max_frames should be larger than the maximum number of frames of your videos.
python utils/generate_mask.py -v <dataset_name> -k 4 --max_frames 512
Run inference. You may change values of arguments or model_specs variable. <mask_name> is the folder contains mask, which is mask_1_k by default.
python infer_example.py -v <dataset_name> -m <mask_name> -c $CKPT$
| Method | PSNR ↑ | SSIM ↑ |
|---|---|---|
| E2FGVI | 23.81 | 0.9378 |
| Ours | 25.71 | 0.9464 |
Qualitative comparisons of discriminator designs on 480p DAVIS dataset. Our method produces more temporally consistent and visually plausible outpainted regions.
We use the YouTube-VOS dataset for training and evaluation. Please follow the data preparation steps in the Fine-tune E2FGVI to Outpainting section.
To reproduce our results:
# Fine-tune E2FGVI for outpainting
python train.py --config configs/final.json
# Monitor training with wandb (optional)
# Set your wandb project name in the configEvaluate on standard datasets:
# Evaluate on YouTube-VOS
python evaluate.py --dataset youtube-vos --data_root $DATA_ROOT$ --model e2fgvi_hq --ckpt $CKPT$ --result_path results_youtube --save_results
# Convert results to videos
python utils/pngs_to_video.py --input_dir results_youtube --output_dir videos_output- This code is based on E2FGVI. We thank the authors of E2FGVI for their excellent work and open-source implementation.
- This work was supported by SKT AI Fellowship.
Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.
For questions and issues, please:
- Open an issue in this repository
- Contact: [email protected]
