This project integrates CameraCtrl as guidance into DreamGaussian to enhance 3D reconstruction results. By incorporating pose-aware frontal view inputs into the existing pipeline end-to-end, we achieve more refined frontal 3D reconstruction results.
Implementation of CameraCtrlGuidance class based on CameraCtrl's inference and pipeline:
- Pipeline configuration
- Image embedding retrieval via
get_img_embedsmethod - Plucker embedding generation from Camera [R|t] via
process_camera_paramsmethod - SDS Loss calculation in
train_stepmethod
- Integration of cameractrl guidance into the original dreamgaussian
main.py - Detailed training configuration specified in
configs/image_camctrl.yaml
Training steps:
- Gaussian initialization based on the first condition image
- ~50 iterations:
- Gaussian optimization using only 14 frames from cameractrl
- 50 ~ 200 iterations:
- Unseen viewpoint optimization using a mix of Zero123 and CameraCtrl
- 200+ iterations:
- Camera pose rotation every 20 iterations followed by optimization from CameraCtrl pose
- dreamgaussian Docker environment + dreamgaussian conda environment
- Additional dependencies for cameractrl guidance:
# for stable-diffusion
huggingface_hub == 0.19.4
diffusers == 0.24.0
accelerate == 0.24.1
transformers == 4.36.2
- Checkpoint downloads
Inside
src/checkpoints/, the following pre-trained models are required: - stable-video-diffusion-img2vid-xt
- CameraCtrl_svdxt
camctrl-dreamgaussian/
├── configs/ # Configuration files
│ ├── image_camctrl.yaml # Dreamgaussian config
│ └── svdxt_320_576_cameractrl.yaml # CameraCtrl config
├── utils/ # utility scripts
│ ├── process.py # Processing utilities
│ └── process_camctrl.py # CameraCtrl processing
├── src/ # Main source code directory
│ ├── checkpoints/ # Model checkpoints
│ ├── guidance/ # Guidance utilities
│ │ └── cameractrl_utils2.py
│ ├── cameractrl/ # CameraCtrl related code
│ ├── diff-gaussian-rasterization/ # Gaussian splatting implementation
│ ├── simple-knn/ # KNN implementation
│ ├── scripts/ # Utility scripts
│ ├── main.py # Main training script
│ └── main2.py # mesh & 2D albedo map training script
├── datasets/ # Dataset directory
| ├── cameractrl_assets/
│ │ ├── pose_files # generated pose files
│ │ └── svd_prompts.json
│ └── condition_image.png # Input condition image
└── results/ # Output results directory
- Prepare one condition image from CameraCtrl and one pose file (.txt) generated from generate_pose_file_from_COLMAP.py. Place the condition image in datasets/ and pose_file.txt in datasets/cameractrl_assets/pose_files/
- Run process.py:
python utils/process.py datasets/condition_image.png
- Original process.py for zero123
python utils/process_camctrl.py datasets/condition_image.png
- Modified to maintain original image aspect ratio without square resizing
- If the image ratio is not 720*1280 (typical phone camera), update cameractrl_image_width, cameractrl_image_height, cameractrl_original_pose_width, cameractrl_original_pose_height in src/configs/image_camctrl.yaml => Verify creation of condition_image_rgba.png in datasets/
- Run main.py:
python main.py --config configs/image_camctrl.yaml input=data/condition_image_rgba.png
- Run the original main2.py for mesh refinement:
python main2.py --config configs/image_camctrl.yaml input=data/condition_image_rgba.png

