Skip to content

Official implementation of "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR"

License

Notifications You must be signed in to change notification settings

sang-w00/IN2OUT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IN2OUT

[ICIP 2025] Official implementation of "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR"

Paper

Abstract

This repository contains the official implementation of our ICIP 2025 paper "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR". We present a method for fine-tuning the video inpainting model specifically for video outpainting tasks, enabling seamless extension of video content beyond original frame boundaries.

News

  • 2025.05.20: Paper accepted to ICIP 2025! 🎉
  • 2025.07.06: Code and pretrained models released

Installation

This project is supported by CUDA 11.7, Python 3.7. Import the conda environment using below command.

conda env create -f e2fgvi.yaml

If you face error while running above code, install mmcv dependency via commands below.

MMCV dependency

conda activate e2fgvi
pip install mmcv==2.0.0rc4 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
pip install -U openmim
mim install mmcv-full

Pretrained models

  • Download pretrained E2FGVI(HQ) from E2FGVI
  • Download fine-tuned outpainting model from our Google Drive

Quick Start

Download Pretrained Model

Run Inference on Your Video

# Prepare your video and generate masks
python utils/generate_mask.py -v your_video_folder -k 4 --max_frames 512

# Run outpainting inference
python infer_example.py -v your_video_folder -m mask_1_4 -c release_model/in2out_e2fgvi.pth

Fine-tune E2FGVI to Outpainting

Prepare data

  1. Download Youtube-VOS from Official Link (Download train_all_frames.zip and test_all_frames.zip)
  2. Unzip and merge JPEGImages directories under youtube-vos/,
mv train_all_frames/JPEGImages/* /datas/youtube-vos/JPEGOriginal/
mv test_all_frames/JPEGImages/* /datas/youtube-vos/JPEGOriginal/

and download train.json and test.json from E2FGVI Github, resulting

|- datas
    |- youtube-vos
        train.json
        test.json
        |- JPEGOriginal
            |- <video_id>
                |- <frame_id>.jpg
                |- <frame_id>.jpg
            |- <video_id>
                |- <frame_id>.jpg
                |- <frame_id>.jpg
  1. Run utils/zip_files.py and remove original directory, resulting
|- datas
    |- youtube-vos
        |- JPEGImages
            |- <video_id>.zip
            |- <video_id>.zip
  1. Set the data_root attribute of configs/hierarchical.json as the absolute path to your dataset root (/datas in above example)

Run fine-tuning

python train.py 

Our fine-tuning code log process using wandb by default. You can disable logging by --no_log flag.

Evaluate

python evaluate.py --dataset youtube-vos --data_root $DATA_ROOT$ --model e2fgvi_hq --ckpt $CKPT$ --result_path results_youtube --save_results

Evaluation log will saved under result_path. --save_results flag save all inferenced videos as png files. You may use utils/pngs_to_video.py to transform saved images to a video.

Outpaint your video / Evaluate in your video

To outpaint your video(s), prepare your directory as follows.

|- <dataset_name>
    |- video
        |- <video1_name>.mp4
        |- <video2_name>.mp4

Your video should be padded with desired outpainted region. For example, if you're trying to outpaint 4:3 video to 16:9, your video should be 16:9 with the padding already placed. The code supports evaluation by default, so ignore PSNR/SSIM if you are outpainting your padded video.

Run utils/generate_mask.py. k should be integer value of $1-\dfrac{\text{original width}}{\text{padded width}}$. For example, if you're trying to outpaint 4:3 video to 16:9, k=4. --max_frames should be larger than the maximum number of frames of your videos.

python utils/generate_mask.py -v <dataset_name> -k 4 --max_frames 512

Run inference. You may change values of arguments or model_specs variable. <mask_name> is the folder contains mask, which is mask_1_k by default.

python infer_example.py -v <dataset_name> -m <mask_name> -c $CKPT$

Evaluation Results

Quantitative Results on YouTube-VOS

Method PSNR ↑ SSIM ↑
E2FGVI 23.81 0.9378
Ours 25.71 0.9464

Qualitative comparisons of discriminator designs

Comparison Result

Qualitative comparisons of discriminator designs on 480p DAVIS dataset. Our method produces more temporally consistent and visually plausible outpainted regions.

Dataset

We use the YouTube-VOS dataset for training and evaluation. Please follow the data preparation steps in the Fine-tune E2FGVI to Outpainting section.

Training

To reproduce our results:

# Fine-tune E2FGVI for outpainting
python train.py --config configs/final.json

# Monitor training with wandb (optional)
# Set your wandb project name in the config

Evaluation

Evaluate on standard datasets:

# Evaluate on YouTube-VOS
python evaluate.py --dataset youtube-vos --data_root $DATA_ROOT$ --model e2fgvi_hq --ckpt $CKPT$ --result_path results_youtube --save_results

# Convert results to videos
python utils/pngs_to_video.py --input_dir results_youtube --output_dir videos_output

Acknowledgments

  • This code is based on E2FGVI. We thank the authors of E2FGVI for their excellent work and open-source implementation.
  • This work was supported by SKT AI Fellowship.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Contact

For questions and issues, please:

About

Official implementation of "IN2OUT: FINE-TUNING VIDEO INPAINTING MODEL FOR VIDEO OUTPAINTING USING HIERARCHICAL DISCRIMINATOR"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages