Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Yanran Zhang^*,1, Ziyi Wang^*,1, Wenzhao Zheng^†,1, Zheng Zhu², Jie Zhou¹, Jiwen Lu¹

¹Department of Automation, Tsinghua University, China ²GigaAI

^*Equal Contribution ^†Project Leader

📝 Abstract

MoRe4D generates interactive, dynamic 4D scenes from a single static image. Unlike previous paradigms that decouple generation and reconstruction (leading to geometric inconsistencies), we tightly couple geometric modeling and motion generation, achieving consistent 4D motion and geometry.

Generating interactive, dynamic 4D scenes from a single static image remains a core challenge. Most existing methods decouple geometry from motion (either generate-then-reconstruct or reconstruct-then-generate), causing spatiotemporal inconsistencies and poor generalization.

To overcome these limitations, we extend the reconstruct-then-generate framework to jointly couple Motion generation with geometric Reconstruction for 4D Synthesis (MoRe4D). We introduce:

🗄️ TrajScene-60K: A large-scale dataset of 60,000 video samples with dense point trajectories
🎯 4D Scene Trajectory Generator (4D-STraG): A diffusion-based model that jointly generates geometrically consistent and motion-plausible 4D point trajectories
🎬 4D View Synthesis Module (4D-ViSM): Renders videos with arbitrary camera trajectories from 4D point track representations

🔥 News

2025-12-05: We have submitted our paper to arXiv.
2025-12-06: Code release

🔧 Getting Started

Installation

Clone the repository:

git clone https://github.com/Zhangyr2022/MoRe4D.git
cd MoRe4D

Create a conda environment with Python 3.10:

conda create -n more4d python=3.10
conda activate more4d

Install the required dependencies:

# CUDA 12.4 is recommended
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txt

Install additional dependencies. Clone the following repositories and install them:

UnidepthV2 following https://github.com/lpiccinelli-eth/UniDepth#
Gaussian Splatting following https://github.com/slothfulxtx/diff-gaussian-rasterization

📊 TrajScene-60K Dataset

To address the data scarcity for 4D generation, we present TrajScene-60K, a large-scale dataset containing:

📹 60,000 High-Quality Samples: Curated from WebVid-10M using VLM-based filtering (CogVLM2 & DeepSeek-V3)
🎯 Dense Annotations: Includes dense 4D point trajectories, per-frame depth maps, and occlusion masks
📝 Rich Semantics: Paired with high-quality captions describing both scene content and dynamic behavior

Dataset will be released soon!

Usage

Prepare your input data in the required format.

Training Motion-Senesitive VAE

Please download Wan2.1-Fun-V1.1-14B-Control checkpoint first, and put it into ./models folder.

bash scripts/4D_STraG_training/train_vae.sh

Training 4D-STraG(Scene Trajectory Generator)

4D-STraG is a joint diffusion model that simultaneously reconstructs and generates spatiotemporal point trajectories. Key innovations:

Depth-Guided Motion Normalization: Ensures scale invariance
Motion Perception Module (MPM): Injects rich motion priors from the input image

In addition to downloading the Wan2.1-Fun-V1.1-14B-Control checkpoint, you also need to download the OmniMAE and UniDepth checkpoints, then place them in the ./models folder.

bash scripts/4D_STraG_training/train_wan.sh

Training 4D-ViSM(View Synthesis Module)

4D-ViSM Leverages the dense 4D point cloud representation to synthesize high-fidelity novel view videos, filling in dis-occluded regions coherently using generative priors.

Please download Wan2.1-Fun-V1.1-14B-InP checkpoint first, and put it into ./models folder.

bash scripts/4D_ViSM_training/train.sh

Memory Optimization

If you encounter OOM (Out of Memory) issues, enable DeepSpeed by modifying the accelerate launch line in the scripts:

DeepSpeed Zero-2: Add accelerate launch --use_deepspeed --deepspeed_config_file config/zero_stage2_config.json --deepspeed_multinode_launcher standard /path/to/script

DeepSpeed Zero-3 (Max Savings): Add accelerate launch --zero_stage 3 --zero3_save_16bit_model true --zero3_init_flag true --use_deepspeed --deepspeed_config_file config/zero_stage3_config.json --deepspeed_multinode_launcher standard /path/to/script

Inference

After training, you can generate 4D scenes using the inference script:

# Inference for whole pipeline
bash infer.sh
# Inference for VAE
bash infer_vae.sh

🎨 Results Showcase

Generated Samples

Input	4D Point Tracking (4D-STraG)	Multi-View Videos (4D-ViSM)
A brown bear walks across rocky terrain.
	bear_render.mp4	1.mp4
A camel walks along a path in a sunny zoo enclosure.
	camel_render.mp4	1.mp4

💡 Methodology

Figure: Overview of the MoRe4D framework for unified 4D synthesis.

Our framework consists of two core components designed to ensure both geometric stability and dynamic realism.

🙏 Acknowledgments

We extend our sincere gratitude to the following open-source projects for their valuable resources and foundational support:

We are also thankful to the broader open-source community for their continuous contributions and support.

📖 Citation

If you find our work useful for your research, please consider citing us:

@article{zhang2025more4d,
  title={Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image},
  author={Zhang, Yanran and Wang, Ziyi and Zheng, Wenzhao and Zhu, Zheng and Zhou, Jie and Lu, Jiwen},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.05044}, 
}

📧 Contact

For questions and discussions, please open an issue or contact:

Yanran Zhang: GitHub
Ziyi Wang: Homepage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

📝 Abstract

🔥 News

🔧 Getting Started

Installation

📊 TrajScene-60K Dataset

Usage

Training Motion-Senesitive VAE

Training 4D-STraG(Scene Trajectory Generator)

Training 4D-ViSM(View Synthesis Module)

Memory Optimization

Inference

🎨 Results Showcase

Generated Samples

💡 Methodology

🙏 Acknowledgments

📖 Citation

📧 Contact

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MoRe4D		MoRe4D
config		config
models		models
scripts		scripts
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Zhangyr2022/MoRe4D

Folders and files

Latest commit

History

Repository files navigation

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

📝 Abstract

🔥 News

🔧 Getting Started

Installation

📊 TrajScene-60K Dataset

Usage

Training Motion-Senesitive VAE

Training 4D-STraG(Scene Trajectory Generator)

Training 4D-ViSM(View Synthesis Module)

Memory Optimization

Inference

🎨 Results Showcase

Generated Samples

💡 Methodology

🙏 Acknowledgments

📖 Citation

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages