Skip to content

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

License

Notifications You must be signed in to change notification settings

Zhangyr2022/MoRe4D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Yanran Zhang*,1, Ziyi Wang*,1, Wenzhao Zheng†,1, Zheng Zhu2, Jie Zhou1, Jiwen Lu1

1Department of Automation, Tsinghua University, China     2GigaAI

*Equal Contribution    Project Leader

Code arXiv Website Dataset

MoRe4D Teaser

📝 Abstract

MoRe4D generates interactive, dynamic 4D scenes from a single static image. Unlike previous paradigms that decouple generation and reconstruction (leading to geometric inconsistencies), we tightly couple geometric modeling and motion generation, achieving consistent 4D motion and geometry.

Generating interactive, dynamic 4D scenes from a single static image remains a core challenge. Most existing methods decouple geometry from motion (either generate-then-reconstruct or reconstruct-then-generate), causing spatiotemporal inconsistencies and poor generalization.

To overcome these limitations, we extend the reconstruct-then-generate framework to jointly couple Motion generation with geometric Reconstruction for 4D Synthesis (MoRe4D). We introduce:

  • 🗄️ TrajScene-60K: A large-scale dataset of 60,000 video samples with dense point trajectories
  • 🎯 4D Scene Trajectory Generator (4D-STraG): A diffusion-based model that jointly generates geometrically consistent and motion-plausible 4D point trajectories
  • 🎬 4D View Synthesis Module (4D-ViSM): Renders videos with arbitrary camera trajectories from 4D point track representations

🔥 News

  • 2025-12-05: We have submitted our paper to arXiv.
  • 2025-12-06: Code release

🔧 Getting Started

Installation

  1. Clone the repository:

    git clone https://github.com/Zhangyr2022/MoRe4D.git
    cd MoRe4D
  2. Create a conda environment with Python 3.10:

    conda create -n more4d python=3.10
    conda activate more4d
  3. Install the required dependencies:

    # CUDA 12.4 is recommended
    conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
    pip install -r requirements.txt
  4. Install additional dependencies. Clone the following repositories and install them:

📊 TrajScene-60K Dataset

TrajScene-60K Dataset

To address the data scarcity for 4D generation, we present TrajScene-60K, a large-scale dataset containing:

  • 📹 60,000 High-Quality Samples: Curated from WebVid-10M using VLM-based filtering (CogVLM2 & DeepSeek-V3)
  • 🎯 Dense Annotations: Includes dense 4D point trajectories, per-frame depth maps, and occlusion masks
  • 📝 Rich Semantics: Paired with high-quality captions describing both scene content and dynamic behavior

Dataset will be released soon!

Usage

Prepare your input data in the required format.

Training Motion-Senesitive VAE

Please download Wan2.1-Fun-V1.1-14B-Control checkpoint first, and put it into ./models folder.

bash scripts/4D_STraG_training/train_vae.sh

Training 4D-STraG(Scene Trajectory Generator)

4D-STraG is a joint diffusion model that simultaneously reconstructs and generates spatiotemporal point trajectories. Key innovations:

  • Depth-Guided Motion Normalization: Ensures scale invariance
  • Motion Perception Module (MPM): Injects rich motion priors from the input image

In addition to downloading the Wan2.1-Fun-V1.1-14B-Control checkpoint, you also need to download the OmniMAE and UniDepth checkpoints, then place them in the ./models folder.

bash scripts/4D_STraG_training/train_wan.sh

Training 4D-ViSM(View Synthesis Module)

4D-ViSM Leverages the dense 4D point cloud representation to synthesize high-fidelity novel view videos, filling in dis-occluded regions coherently using generative priors.

Please download Wan2.1-Fun-V1.1-14B-InP checkpoint first, and put it into ./models folder.

bash scripts/4D_ViSM_training/train.sh

Memory Optimization

If you encounter OOM (Out of Memory) issues, enable DeepSpeed by modifying the accelerate launch line in the scripts:

DeepSpeed Zero-2: Add accelerate launch --use_deepspeed --deepspeed_config_file config/zero_stage2_config.json --deepspeed_multinode_launcher standard /path/to/script

DeepSpeed Zero-3 (Max Savings): Add accelerate launch --zero_stage 3 --zero3_save_16bit_model true --zero3_init_flag true --use_deepspeed --deepspeed_config_file config/zero_stage3_config.json --deepspeed_multinode_launcher standard /path/to/script

Inference

After training, you can generate 4D scenes using the inference script:

# Inference for whole pipeline
bash infer.sh
# Inference for VAE
bash infer_vae.sh

🎨 Results Showcase

Generated Samples

Input 4D Point Tracking (4D-STraG) Multi-View Videos (4D-ViSM)
A brown bear walks across rocky terrain.
bear_render.mp4
1.mp4
A camel walks along a path in a sunny zoo enclosure.
camel_render.mp4
1.mp4

💡 Methodology

MoRe4D Pipeline

Figure: Overview of the MoRe4D framework for unified 4D synthesis.

Our framework consists of two core components designed to ensure both geometric stability and dynamic realism.

🙏 Acknowledgments

We extend our sincere gratitude to the following open-source projects for their valuable resources and foundational support:

We are also thankful to the broader open-source community for their continuous contributions and support.

📖 Citation

If you find our work useful for your research, please consider citing us:

@article{zhang2025more4d,
  title={Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image},
  author={Zhang, Yanran and Wang, Ziyi and Zheng, Wenzhao and Zhu, Zheng and Zhou, Jie and Lu, Jiwen},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.05044}, 
}

📧 Contact

For questions and discussions, please open an issue or contact:

About

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Resources

License

Stars

Watchers

Forks