Yanran Zhang*,1, Ziyi Wang*,1, Wenzhao Zheng†,1, Zheng Zhu2, Jie Zhou1, Jiwen Lu1
1Department of Automation, Tsinghua University, China 2GigaAI
*Equal Contribution †Project Leader
MoRe4D generates interactive, dynamic 4D scenes from a single static image. Unlike previous paradigms that decouple generation and reconstruction (leading to geometric inconsistencies), we tightly couple geometric modeling and motion generation, achieving consistent 4D motion and geometry.
Generating interactive, dynamic 4D scenes from a single static image remains a core challenge. Most existing methods decouple geometry from motion (either generate-then-reconstruct or reconstruct-then-generate), causing spatiotemporal inconsistencies and poor generalization.
To overcome these limitations, we extend the reconstruct-then-generate framework to jointly couple Motion generation with geometric Reconstruction for 4D Synthesis (MoRe4D). We introduce:
- 🗄️ TrajScene-60K: A large-scale dataset of 60,000 video samples with dense point trajectories
- 🎯 4D Scene Trajectory Generator (4D-STraG): A diffusion-based model that jointly generates geometrically consistent and motion-plausible 4D point trajectories
- 🎬 4D View Synthesis Module (4D-ViSM): Renders videos with arbitrary camera trajectories from 4D point track representations
- 2025-12-05: We have submitted our paper to arXiv.
- 2025-12-06: Code release
-
Clone the repository:
git clone https://github.com/Zhangyr2022/MoRe4D.git cd MoRe4D -
Create a conda environment with Python 3.10:
conda create -n more4d python=3.10 conda activate more4d
-
Install the required dependencies:
# CUDA 12.4 is recommended conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia pip install -r requirements.txt -
Install additional dependencies. Clone the following repositories and install them:
- UnidepthV2 following https://github.com/lpiccinelli-eth/UniDepth#
- Gaussian Splatting following https://github.com/slothfulxtx/diff-gaussian-rasterization
To address the data scarcity for 4D generation, we present TrajScene-60K, a large-scale dataset containing:
- 📹 60,000 High-Quality Samples: Curated from WebVid-10M using VLM-based filtering (CogVLM2 & DeepSeek-V3)
- 🎯 Dense Annotations: Includes dense 4D point trajectories, per-frame depth maps, and occlusion masks
- 📝 Rich Semantics: Paired with high-quality captions describing both scene content and dynamic behavior
Dataset will be released soon!
Prepare your input data in the required format.
Please download Wan2.1-Fun-V1.1-14B-Control checkpoint first, and put it into ./models folder.
bash scripts/4D_STraG_training/train_vae.sh4D-STraG is a joint diffusion model that simultaneously reconstructs and generates spatiotemporal point trajectories. Key innovations:
- Depth-Guided Motion Normalization: Ensures scale invariance
- Motion Perception Module (MPM): Injects rich motion priors from the input image
In addition to downloading the Wan2.1-Fun-V1.1-14B-Control checkpoint, you also need to download the OmniMAE and UniDepth checkpoints, then place them in the ./models folder.
bash scripts/4D_STraG_training/train_wan.sh4D-ViSM Leverages the dense 4D point cloud representation to synthesize high-fidelity novel view videos, filling in dis-occluded regions coherently using generative priors.
Please download Wan2.1-Fun-V1.1-14B-InP checkpoint first, and put it into ./models folder.
bash scripts/4D_ViSM_training/train.shIf you encounter OOM (Out of Memory) issues, enable DeepSpeed by modifying the accelerate launch line in the scripts:
DeepSpeed Zero-2: Add accelerate launch --use_deepspeed --deepspeed_config_file config/zero_stage2_config.json --deepspeed_multinode_launcher standard /path/to/script
DeepSpeed Zero-3 (Max Savings): Add accelerate launch --zero_stage 3 --zero3_save_16bit_model true --zero3_init_flag true --use_deepspeed --deepspeed_config_file config/zero_stage3_config.json --deepspeed_multinode_launcher standard /path/to/script
After training, you can generate 4D scenes using the inference script:
# Inference for whole pipeline
bash infer.sh
# Inference for VAE
bash infer_vae.shFigure: Overview of the MoRe4D framework for unified 4D synthesis.
Our framework consists of two core components designed to ensure both geometric stability and dynamic realism.
We extend our sincere gratitude to the following open-source projects for their valuable resources and foundational support:
We are also thankful to the broader open-source community for their continuous contributions and support.
If you find our work useful for your research, please consider citing us:
@article{zhang2025more4d,
title={Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image},
author={Zhang, Yanran and Wang, Ziyi and Zheng, Wenzhao and Zhu, Zheng and Zhou, Jie and Lu, Jiwen},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.05044},
}For questions and discussions, please open an issue or contact:




