MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Chenguo Lin1*, Yuchen Lin1,3*, Panwang Pan2†,
Yifan Yu2, Tao Hu2, Honglei Yan2, Katerina Fragkiadaki3, Yadong Mu1
1Peking University, 2ByteDance, 3Carnegie Mellon University
This repository contains the official implementation of the paper: MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second. MoVieS is a feed-forward framework that jointly reconstructs appearance, geometry and motion for 4D scene perception from monocular videos in one second.
Feel free to contact me (chenguolin@stu.pku.edu.cn) or open an issue if you have any questions or suggestions.
You may also be interested in our other works:
- [CVPR 2026] Diff4Splat: a generative model for 4D dynamic scenes from a single-view image.
- 2026-02-26: The source codes for inference and training, and pretrained checkpoint are released.
- 2026-02-21: The paper is accepted to CVPR 2026.
- 2025-07-15: This repo is initialized and MoVieS technical report is released on arXiv.
- Provide source codes for inference and training.
- Provide the pretrained checkpoint.
- Provide detailed instructions for inference and training.
- Make the codebase cleaner.
You may need to modify the specific version of torch in settings/setup.sh according to your CUDA version.
There are no restrictions on the torch version, feel free to use your preferred one.
conda create -n movies python=3.10 -y
conda activate movies
git clone https://github.com/chenguolin/MoVieS.git
cd MoVieS
bash settings/setup.sh- Download the pretrained checkpoint and preprocessed videos with poses from 🤗HuggingFace and put them in
resources/. In-the-wild videos are provided by the DAVIS dataset, and their camera poses are estimated by MegaSAM.
mkdir -p resources && cd resources
hf download chenguolin/MoVieS movies_ckpt.safetensors DAVIS/ --local-dir resources- Run the inference script for novel view synthesis. In the inference script, we first render the dynamic scene with a fixed camera, and then fix the timestamp and render the scene with a moving camera to synthesize novel views. You can also apply other desired combinations of camera poses and timestamps for novel view rendering.
# python src/infer_davis_nvs.py --name <DAVIS_SAMPLE_NAME>
# For example:
python src/infer_davis_nvs.py --name motocross-bumpsInference results will be saved in out/<DAVIS_SAMPLE_NAME>. You will get:
Input Video (input_video.mp4) |
Predicted Motion (output_motion_camera0.mp4) |
Novel View Synthesis (output_render.mp4) |
|---|---|---|
![]() |
![]() |
![]() |
- TODO
- TODO
- TODO
- We use three static scene datasets (RealEstate10K, TartanAir and MatrixCity) and five dynamic scene datasets (PointOdyssey, DynamicReplica, Spring, VKITTI2 and Stereo4D) to train MoVieS.
- Support combining multiple datasets for training via
src/data/easy_dataset.py. - Support dynamic numbers of input frames and aspect ratios via
src/data/dynamic_dataloader.py. - Set your dataset directory in
src/options.pybefore training.
- TODO
We would like to thank the authors of DiffSplat, VGGT, NoPoSplat, and CUT3R for their great work and generously providing source codes, which inspired our work and helped us a lot in the implementation.
If you find our work helpful, please consider citing:
@inproceedings{lin2026movies,
title={MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second},
author={Lin, Chenguo and Lin, Yuchen and Pan, Panwang and Yu, Yifan and Hu, Tao and Yan, Honglei and Fragkiadaki, Katerina and Mu, Yadong},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
year={2026}
}



