IN2OUT

[ICIP 2025] Official implementation of "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR"

Abstract

This repository contains the official implementation of our ICIP 2025 paper "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR". We present a method for fine-tuning the video inpainting model specifically for video outpainting tasks, enabling seamless extension of video content beyond original frame boundaries.

News

2025.05.20: Paper accepted to ICIP 2025! 🎉
2025.07.06: Code and pretrained models released

Installation

This project is supported by CUDA 11.7, Python 3.7. Import the conda environment using below command.

conda env create -f e2fgvi.yaml

If you face error while running above code, install mmcv dependency via commands below.

MMCV dependency

conda activate e2fgvi
pip install mmcv==2.0.0rc4 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
pip install -U openmim
mim install mmcv-full

Pretrained models

Download pretrained E2FGVI(HQ) from E2FGVI
Download fine-tuned outpainting model from our Google Drive

Quick Start

Download Pretrained Model

Run Inference on Your Video

# Prepare your video and generate masks
python utils/generate_mask.py -v your_video_folder -k 4 --max_frames 512

# Run outpainting inference
python infer_example.py -v your_video_folder -m mask_1_4 -c release_model/in2out_e2fgvi.pth

Fine-tune E2FGVI to Outpainting

Prepare data

Download Youtube-VOS from Official Link (Download train_all_frames.zip and test_all_frames.zip)
Unzip and merge JPEGImages directories under youtube-vos/,

mv train_all_frames/JPEGImages/* /datas/youtube-vos/JPEGOriginal/
mv test_all_frames/JPEGImages/* /datas/youtube-vos/JPEGOriginal/

and download train.json and test.json from E2FGVI Github, resulting

|- datas
    |- youtube-vos
        train.json
        test.json
        |- JPEGOriginal
            |- <video_id>
                |- <frame_id>.jpg
                |- <frame_id>.jpg
            |- <video_id>
                |- <frame_id>.jpg
                |- <frame_id>.jpg

Run utils/zip_files.py and remove original directory, resulting

|- datas
    |- youtube-vos
        |- JPEGImages
            |- <video_id>.zip
            |- <video_id>.zip

Set the data_root attribute of configs/hierarchical.json as the absolute path to your dataset root (/datas in above example)

Run fine-tuning

python train.py

Our fine-tuning code log process using wandb by default. You can disable logging by --no_log flag.

Evaluate

python evaluate.py --dataset youtube-vos --data_root $DATA_ROOT$ --model e2fgvi_hq --ckpt $CKPT$ --result_path results_youtube --save_results

Evaluation log will saved under result_path. --save_results flag save all inferenced videos as png files. You may use utils/pngs_to_video.py to transform saved images to a video.

Outpaint your video / Evaluate in your video

To outpaint your video(s), prepare your directory as follows.

|- <dataset_name>
    |- video
        |- <video1_name>.mp4
        |- <video2_name>.mp4

Your video should be padded with desired outpainted region. For example, if you're trying to outpaint 4:3 video to 16:9, your video should be 16:9 with the padding already placed. The code supports evaluation by default, so ignore PSNR/SSIM if you are outpainting your padded video.

Run utils/generate_mask.py. k should be integer value of $1-\dfrac{\text{original width}}{\text{padded width}}$. For example, if you're trying to outpaint 4:3 video to 16:9, k=4. --max_frames should be larger than the maximum number of frames of your videos.

python utils/generate_mask.py -v <dataset_name> -k 4 --max_frames 512

Run inference. You may change values of arguments or model_specs variable. <mask_name> is the folder contains mask, which is mask_1_k by default.

python infer_example.py -v <dataset_name> -m <mask_name> -c $CKPT$

Evaluation Results

Quantitative Results on YouTube-VOS

Method	PSNR ↑	SSIM ↑
E2FGVI	23.81	0.9378
Ours	25.71	0.9464

Qualitative comparisons of discriminator designs

Qualitative comparisons of discriminator designs on 480p DAVIS dataset. Our method produces more temporally consistent and visually plausible outpainted regions.

Dataset

We use the YouTube-VOS dataset for training and evaluation. Please follow the data preparation steps in the Fine-tune E2FGVI to Outpainting section.

Training

To reproduce our results:

# Fine-tune E2FGVI for outpainting
python train.py --config configs/final.json

# Monitor training with wandb (optional)
# Set your wandb project name in the config

Evaluation

Evaluate on standard datasets:

# Evaluate on YouTube-VOS
python evaluate.py --dataset youtube-vos --data_root $DATA_ROOT$ --model e2fgvi_hq --ckpt $CKPT$ --result_path results_youtube --save_results

# Convert results to videos
python utils/pngs_to_video.py --input_dir results_youtube --output_dir videos_output

Acknowledgments

This code is based on E2FGVI. We thank the authors of E2FGVI for their excellent work and open-source implementation.
This work was supported by SKT AI Fellowship.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Contact

For questions and issues, please:

Open an issue in this repository
Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
core		core
model		model
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
e2fgvi.yaml		e2fgvi.yaml
evaluate.py		evaluate.py
infer_example.py		infer_example.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IN2OUT

Abstract

News

Installation

MMCV dependency

Pretrained models

Quick Start

Download Pretrained Model

Run Inference on Your Video

Fine-tune E2FGVI to Outpainting

Prepare data

Run fine-tuning

Evaluate

Outpaint your video / Evaluate in your video

Evaluation Results

Quantitative Results on YouTube-VOS

Qualitative comparisons of discriminator designs

Dataset

Training

Evaluation

Acknowledgments

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

sang-w00/IN2OUT

Folders and files

Latest commit

History

Repository files navigation

IN2OUT

Abstract

News

Installation

MMCV dependency

Pretrained models

Quick Start

Download Pretrained Model

Run Inference on Your Video

Fine-tune E2FGVI to Outpainting

Prepare data

Run fine-tuning

Evaluate

Outpaint your video / Evaluate in your video

Evaluation Results

Quantitative Results on YouTube-VOS

Qualitative comparisons of discriminator designs

Dataset

Training

Evaluation

Acknowledgments

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages