[AAAI 2024] Video Frame Prediction from a Single Image and Events

Juanjuan Zhu*, Zhexiong Wan*, Yuchao Dai†

Northwestern Polytechnical University, Xi’an, China

Shaanxi Key Laboratory of Information Acquisition and Processing

Abstract

Recently, the task of Video Frame Prediction (VFP), which predicts future video frames from previous ones through extrapolation, has made remarkable progress. However, the performance of existing VFP methods is still far from satisfactory due to the fixed framerate video used: 1) they have difficulties in handling complex dynamic scenes; 2) they cannot predict future frames with flexible prediction time intervals. The event cameras can record the intensity changes asynchronously with a very high temporal resolution, which provides rich dynamic information about the observed scenes. In this paper, we propose to predict video frames from a single image and the following events, which can not only handle complex dynamic scenes but also predict future frames with flexible prediction time intervals. First, we introduce a symmetrical cross-modal attention augmentation module to enhance the complementary information between images and events. Second, we propose to jointly achieve optical flow estimation and frame generation by combining the motion information of events and the semantic information of the image, then inpainting the holes produced by forward warping to obtain an ideal prediction frame. Based on these, we propose a lightweight pyramidal coarse-to-fine model that can predict a 720P frame within 25 ms. Extensive experiments show that our proposed model significantly outperforms the state-of-the-art frame-based and event-based VFP methods and has the fastest runtime.

Requirements

Ubuntu
PyTorch 1.13.1
CUDA 11.7
python 3.8

build the environment with

conda create -y -n VFPSIE python=3.8
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Quick Usage

Generate a target frame using our model:

python run_sample.py  --sample_folder_path  ./sample_data --ckpt_path pretrained_model/VFPSIE.pth --save_output_dir ./output

Citation

@inproceedings{Zhu_VFPSIE_AAAI_2024,
  title={Video Frame Prediction from a Single Image and Events},
  author={Zhu, Juanjuan and Wan, Zhexiong and Dai, Yuchao},
  booktitle={ AAAI Conference on Artificial Intelligence (AAAI) },
  year={2024},
}

Acknowledgement

Thanks for the inspiration from the following work:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
core		core
pretrained_model		pretrained_model
sample_data		sample_data
visualization		visualization
README.md		README.md
requirements.txt		requirements.txt
run_sample.py		run_sample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[AAAI 2024] Video Frame Prediction from a Single Image and Events

Abstract

Requirements

Quick Usage

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Gwynplainyg/VFPSIE

Folders and files

Latest commit

History

Repository files navigation

[AAAI 2024] Video Frame Prediction from a Single Image and Events

Abstract

Requirements

Quick Usage

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages