This job bundle runs a 3D Gaussian Splatting pipeline. In a few hours, you can train your own Gaussian Splatting point cloud from a video by following the instructions in the prerequisites and this README. The job bundle takes a video file as input, and produces a Gaussian Splatting .ply file as output.
The pipeline consists of a single task that runs scripts to:
- Extract video frames with FFmpeg.
- Solve Structure-from-Motion with COLMAP and GLOMAP, saving the pinhole model and undistorted images
- Train Gaussian Splatting with NeRF Studio splatfacto, Splatfacto in the Wild, or the simple_trainer.py gsplat library example. Output is saved to the .ply output specified.
After downloading the output, you can view it in any Gaussian Splatting viewer such as SuperSplat.
In the last section, you'll find some ideas for next steps after trying out this sample. You can remix the sample to customize your own 3D reconstruction pipeline, or follow the same patterns to run different CUDA workloads on your Deadline Cloud CUDA farm.
- Create an AWS account if you do not already have one.
- Follow the CUDA farm sample CloudFormation template README instructions to create a Deadline Cloud farm that has a CUDA GPU fleet and can build conda packages. You will run the Gaussian Splatting job on this farm.
- Follow the NeRF Studio sample conda package recipe README instructions to build a NeRF Studio conda package into the S3 channel of your CUDA farm.
Now you just need a video that captures many viewpoints of a subject to reconstruct in 3D, and then you can run the Gaussian Splatting pipeline on your farm.
You can use a video-capable camera like your smartphone to capture a video of a subject for your Gaussian Splatting. Here are some tips to consider:
- Use a wide field of view, for example zoom level 0.5 in your camera app. With a wider field of view, more objects will be common between image pairs for Structure-from-Motion to use.
- Turn off video stabilization. This preserves identical lens optics between all frames, and can increase the quality of solves.
- Plan your camera motion depending on the subject.
- To capture an object, like a bench or a bicycle, orbit around the the subject several times. Keep the camera at a different height each time and if possible at different distances from the subject.
- To capture a room interior, follow around the outside of the room with the camera facing inwards. Repeat at different camera heights, and if there are particular objects that you want at higher detail treat them like an object capture.
- To capture less structured spaces such as outside terrain, think about how to include everything you want in your Gaussian Splatting, and how to capture all of it from multiple different angles.
- Capture the video with slow and steady motion.
- Keep moving the camera, avoid stopping and panning from a single location.
Copy the video you captured from your camera to your computer for submitting to the farm.
If you don't have a local copy of deadline-cloud-samples GitHub repository, you can make a git clone or download it as a ZIP.
From the job_bundles
directory of deadline-cloud-samples
, run the following command:
$ deadline bundle gui-submit gsplat_pipeline
Switch to the "Job-specific settings" tab and select paths for both the "Input Video File" and the "Output Ply File".
You can proceed to select the "Submit" button, or customize the settings first. If the input video has higher complexity, you may need to increase the "Approximate Image Count" value.
You can also select which Gaussian Splatting trainer to use and customize the CLI options to it. If you want to use the Monte Carlo Markov chain (MCMC) trainer with the bilateral grid option to produce up to 2 million splats, make the following changes:
- Switch the Gaussian Splatting Trainer from NERFSTUDIO to GSPLAT_SIMPLE_TRAINER.
- Modify the GSplat Simple Trainer Options text from
--strategy.cap-max 1000000
to--strategy.cap-max 2000000
to get 2 million splats, and from--no-use-bilateral-grid
to--use-bilateral-grid
to enable the bilateral grid option.
NOTE: If you select an input video or options that require higher memory usage than the CUDA fleet provides, you will need to update the fleet's minimum settings by re-deploying your CloudFormation template with updated parameter values or editing the settings from the AWS Deadline Cloud management console.
Once the job is submitted, you can monitor its status from the Deadline Cloud monitor job table. Depending on the input video and the settings you selected, the job could be done in 10 minutes or it could take hours.
When the job is running, you can right click on the task and select "View logs" to open the log view:
As the pipeline goes through its steps, it will update the status message visible in the task run details:
If you encounter errors, the log output in this view will help you track down the cause. You can identify which part of the pipeline failed, as it could be an issue with FFmpeg processing the input video, GLOMAP solving Structure-from-Motion, or NeRF Studio training the Gaussian splats. Then you can hone in on the specific errors, such as running out of memory or being unable to solve for the camera poses. The answer may be to adjust the fleet infrastructure if you want to keep the settings you selected, or to change the parameters to reduce the image resolution, count or something else.
When the job completes successfully, you can download its output from Deadline Cloud monitor. Right click on the completed task in the Tasks table and select "Download output". Depending on your settings, it may show the download progress in your browser, or present a CLI command you can use to download.
The .ply file will be saved to the location you selected when submitting the job. To view the result in your browser, open the SuperSplat Editor website and drag/drop the file from your operating system's file browser onto the SuperSplat page. The initial view will look something like this:
After toggling the Show/Hide Splats option on the right, rotating the scene by about ninety degrees, and navigating around the scene, here's the subject of the capture:
With SuperSplat, you can edit your Gaussian Splatting. Here's the result of selecting a sphere, inverting the selection, and deleting those splats:
There are many tutorials and documentation pages to learn how to use SuperSplat, or import your .ply file into your tool of choice.
Deadline Cloud monitor includes a usage explorer feature that can estimate the cost of the jobs you run and help you understand how much each training costs. The sample illustrated here was trained at higher quality with a larger number of images than default and the Structure-from-Motion and training took about an hour. The example costs shown here are for an on-demand CUDA fleet. If you're comfortable running your tests on off-peak hours when CUDA-capable instances are available with low enough interruption rates, you could use a spot CUDA fleet at reduced cost. What is shown in usage explorer does not include costs outside of your job run time, such as idle worker instance time or storage on your S3 bucket.
You can get good mileage out of the gsplat_pipeline job as it is, but we created it to be pulled apart, edited, and remixed.
Following the spirit of the Open Job Description
Design Tenets,
the job template in this bundle is portable. You can run it locally using the openjd CLI
tool, installable into your Python via pip install openjd-cli
. See the
Introduction to Creating a Job
documentation for some ideas on local development setup.
One way to run it is on EC2, using the AWS deep Learning AMI with Conda.
Create a GPU instance with the AMI, and login to get terminal access. Set the conda channel configuration to exclude defaults
and instead use s3://<my-conda-channel-bucket>/Conda/Default
and conda-forge
. The S3 conda channel should contain a nerfstudio
package
like the one from NeRF Studio sample conda package recipe README. Then run pip install openjd-cli
before the command:
$ openjd run gsplat_pipeline/template.yaml \
--environment ../queue_environments/conda_queue_env_improved_caching.yaml \
-p InputVideoFile=~/videos/My3DCaptureVideo.mp4 \
-p OutputPlyFile=vw.ply \
-p WorkspaceDir=my_workspace
0:00:00.000032 Open Job Description CLI: Session start 2025-03-19T01:46:35.324305+00:00
0:00:00.000084 Open Job Description CLI: Running job 'Gaussian Splatting pipeline (FFmpeg -> COLMAP/GLOMAP -> NeRF Studio)'
0:00:00.000151
0:00:00.000197 ==============================================
0:00:00.000241 --------- Entering Environment: Conda
0:00:00.000274 ==============================================
...
Instead of relying on the conda queue environment via the --environment
option, you could install all
the necessary software with a different method. In that case, you can run the following:
$ openjd run gsplat_pipeline/template.yaml \
-p InputVideoFile=~/videos/My3DCaptureVideo.mp4 \
-p OutputPlyFile=vw.ply \
-p WorkspaceDir=my_workspace
The conda_queue_env_improved_caching.yaml
has some features to help you manage the software environment in a relatively hands-free way. It was added to your
Deadline Cloud queue when you deployed the CUDA farm CloudFormation as well, and these features apply both there and here.
By default, it will take the hash of the values for its parameters CondaPackages and CondaChannels, and generate
an automatic environment name like hashname_04bcf28cb135f7f82cfc27a3
as visible in this part of the output:
...
0:00:00.004261 Using an automatic name for the Conda environment, based on the hash of these values:
0:00:00.004324 CondaChannels: deadline-cloud
0:00:00.004378 CondaPackages: ffmpeg colmap=*=gpu* glomap nerfstudio
0:00:00.006468 Automatic name is hashname_04bcf28cb135f7f82cfc27a3
0:00:00.745875 Named Conda environment hashname_04bcf28cb135f7f82cfc27a3 not found, creating it.
0:00:02.154582 Channels:
0:00:02.154705 - deadline-cloud
0:00:02.154765 - s3://<my-conda-channel-bucket>/Conda/Default
0:00:02.154850 - conda-forge
0:00:02.154903 Platform: linux-64
0:00:10.723394 Collecting package metadata (repodata.json): ...working... done
0:00:16.072905 Solving environment: ...working... done
0:00:16.173576
0:00:16.173668 ## Package Plan ##
0:00:16.173723
0:00:16.173781 environment location: /home/ssm-user/.conda/envs/hashname_04bcf28cb135f7f82cfc27a3
0:00:16.173827
0:00:16.173868 added / updated specs:
0:00:16.173910 - colmap[build=gpu*]
0:00:16.173984 - ffmpeg
0:00:16.174032 - glomap
0:00:16.174072 - nerfstudio
...
After solving, downloading, and installing packages in the environment, the task will run:
...
0:02:39.971583 ==============================================
0:02:39.971612 --------- Running Task
0:02:39.971639 ==============================================
0:02:39.971911 ----------------------------------------------
0:02:39.971982 Phase: Setup
0:02:39.972014 ----------------------------------------------
0:02:39.972047 Writing embedded files for Task to disk.
0:02:39.972101 Mapping: Task.File.Run -> /tmp/OpenJD/CLI-sessiona31itehh/embedded_filesnrdfvegi/gaussian_splatting_pipeline.sh
0:02:39.972219 Wrote: Run -> /tmp/OpenJD/CLI-sessiona31itehh/embedded_filesnrdfvegi/gaussian_splatting_pipeline.sh
0:02:39.972491 ----------------------------------------------
0:02:39.972541 Phase: Running action
0:02:39.972576 ----------------------------------------------
0:02:39.972718 Running command /tmp/OpenJD/CLI-sessiona31itehh/tmpxvvkdhmo.sh
0:02:39.973248 Command started as pid: 4796
0:02:39.973299 Output:
0:02:39.975015 + cd /home/ssm-user/deadline-cloud-samples/job_bundles/my_workspace
0:02:39.975091 + extract_video_frames.sh /home/ssm-user/videos/My3DCaptureVideo.mp4 100 1.0 ./source_images
...
After the task is finished running, the queue environment exit action will run to inspect all the automatic conda virtual environments, and clean any that are older than 96 hours:
...
0:23:48.570419 Cleaning up any automatically-named conda environments that weren't updated within 96 hours.
0:23:49.239172 Checking environment hashname_04bcf28cb135f7f82cfc27a3
0:23:50.411900 Created hashname_04bcf28cb135f7f82cfc27a3 env at 2025-03-20T04:49+00:00
0:23:50.412049 CondaChannels: deadline-cloud
0:23:50.412108 CondaPackages: ffmpeg colmap=*=gpu* glomap nerfstudio
0:23:50.423921 Environment was last updated 0 hours ago.
...
Now, if you run the job again, maybe with different training parameters:
$ openjd run gsplat_pipeline/template.yaml \
--environment ../queue_environments/conda_queue_env_improved_caching.yaml \
-p InputVideoFile=~/videos/My3DCaptureVideo.mp4 \
-p OutputPlyFile=vw_mcmc.ply \
-p WorkspaceDir=my_workspace_mcmc \
-p GaussianSplattingTrainer=GSPLAT_SIMPLE_TRAINER \
-p GSplatSimpleTrainerOptions="mcmc --use-bilateral-grid"
it will use the conda virtual environment created before:
...
0:00:00.004263 Using an automatic name for the Conda environment, based on the hash of these values:
0:00:00.004331 CondaChannels: deadline-cloud
0:00:00.004383 CondaPackages: ffmpeg colmap=*=gpu* glomap nerfstudio
0:00:00.006411 Automatic name is hashname_04bcf28cb135f7f82cfc27a3
0:00:00.714230 Reusing the existing named Conda environment hashname_04bcf28cb135f7f82cfc27a3.
...
and start running the task much sooner:
0:00:08.836003 + cd /home/ssm-user/deadline-cloud-samples/job_bundles/my_workspace_mcmc
0:00:08.836097 + extract_video_frames.sh /home/ssm-user/videos/My3DCaptureVideo.mp4 100 1.0 ./source_images
In the job template, the whole pipeline runs as single script for an Open Job Description step. This style of pipeline either runs to the end, or not at all, and it will always run serially on one worker host. By splitting it up into individual Open Job Description steps, you can run part of the job on a CPU-only host, and the rest of the job on a GPU host. It also gives you the opportunity to introduce cluster parallelism by adding a parameter space to a step that can process the data as independent steps on many different machines.
The sample Job Development Progression illustrates the pattern to follow, using two steps. One to initialize a workspace, and a second to perform data processing. To do the same for the Gaussian Splatting pipeline, start by changing the WorkspaceDir parameter from optional to required, and give it a default value. For Deadline Cloud job attachments to copy data in the workspace between the worker hosts running steps of the job, it must be in a directory with specified data flow instead of in the session directory. The parameter should look something like this after your edit:
# Workspace
- name: WorkspaceDir
description: This path is used for the pipeline's workspace.
userInterface:
control: CHOOSE_DIRECTORY
label: Workspace Directory
groupLabel: Workspace
type: PATH
objectType: DIRECTORY
dataFlow: OUT
default: workspace
minLength: 1
To understand how to the different steps will interact with each other, it's useful to first inspect the workspace contents of a successful job run. You will find that the different model trainers use slightly different directory structure. The job bundle includes a Python script that summarizes the files, collapsing sequences of numbered frames. Here is log output from a simple_trainer mcmc run from after each of the three scripts.
For the script 1. Extract video frames
:
extract_video_frames.sh \
'{{Param.InputVideoFile}}' \
{{Param.ApproxImageCount}} \
{{Param.ImageDownscaleFactor}} \
./source_images
2025/03/18 17:50:53-07:00 Summary of workspace directory .
2025/03/18 17:50:53-07:00 source_images/My3DCaptureVideo_#.jpg
2025/03/18 17:50:53-07:00 With 99 indexes: 1-99
For the script 2. Solve Structure-from-Motion, saving the pinhole model and undistorted images
:
solve_glomap_sfm.sh \
./source_images \
./sfm_workspace \
./sparse \
./images
2025/03/18 17:51:32-07:00 Summary of workspace directory .
2025/03/18 17:51:32-07:00 images/My3DCaptureVideo_#.jpg
2025/03/18 17:51:32-07:00 With 99 indexes: 1-99
2025/03/18 17:51:32-07:00 sfm_workspace/database.db
2025/03/18 17:51:32-07:00 sfm_workspace/sparse/0/cameras.bin
2025/03/18 17:51:32-07:00 sfm_workspace/sparse/0/images.bin
2025/03/18 17:51:32-07:00 sfm_workspace/sparse/0/points3D.bin
2025/03/18 17:51:32-07:00 source_images/My3DCaptureVideo_#.jpg
2025/03/18 17:51:32-07:00 With 99 indexes: 1-99
2025/03/18 17:51:32-07:00 sparse/cameras.bin
2025/03/18 17:51:32-07:00 sparse/images.bin
2025/03/18 17:51:32-07:00 sparse/points3D.bin
For the script 3. Train Gaussian Splatting, saving the .ply output
:
train_gsplat_simple_trainer.sh \
. \
'{{Param.MaxNumIterations}}' \
'{{Param.OutputPlyFile}}' \
{{Param.GSplatSimpleTrainerOptions}}
2025/03/18 17:55:01-07:00 Summary of workspace directory .
2025/03/18 17:55:01-07:00 gsplat_workspace/cfg.yml
2025/03/18 17:55:01-07:00 gsplat_workspace/ckpts/ckpt_6999_rank0.pt
2025/03/18 17:55:01-07:00 gsplat_workspace/ckpts/ckpt_9999_rank0.pt
2025/03/18 17:55:01-07:00 gsplat_workspace/ply/point_cloud_9999.ply
2025/03/18 17:55:01-07:00 gsplat_workspace/renders/val_step6999_#.png
2025/03/18 17:55:01-07:00 With 13 indexes: 0-12
2025/03/18 17:55:01-07:00 gsplat_workspace/stats/train_step6999_rank0.json
2025/03/18 17:55:01-07:00 gsplat_workspace/stats/train_step9999_rank0.json
2025/03/18 17:55:01-07:00 gsplat_workspace/stats/val_step6999.json
2025/03/18 17:55:01-07:00 gsplat_workspace/tb/events.out.tfevents...us-west-2.compute.internal
2025/03/18 17:55:01-07:00 gsplat_workspace/videos/traj_6999.mp4
2025/03/18 17:55:01-07:00 images/My3DCaptureVideo_#.jpg
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 images/frame_#.jpg
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 images_2/frame_#.jpg
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 images_4/frame_#.jpg
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 images_4_png/My3DCaptureVideo_#.png
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 images_4_png/frame_#.png
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 images_8/frame_#.jpg
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 sfm_workspace/database.db
2025/03/18 17:55:01-07:00 sfm_workspace/sparse/0/cameras.bin
2025/03/18 17:55:01-07:00 sfm_workspace/sparse/0/images.bin
2025/03/18 17:55:01-07:00 sfm_workspace/sparse/0/points3D.bin
2025/03/18 17:55:01-07:00 source_images/My3DCaptureVideo_#.jpg
2025/03/18 17:55:01-07:00 With 99 indexes: 1-99
2025/03/18 17:55:01-07:00 sparse/cameras.bin
2025/03/18 17:55:01-07:00 sparse/images.bin
2025/03/18 17:55:01-07:00 sparse/points3D.bin
2025/03/18 17:55:01-07:00 sparse_pc.ply
2025/03/18 17:55:01-07:00 transforms.json
Observe how the workspace contains sub-workspaces for different processes it runs, and data in
directories like source_images
, sparse
, and images
that were shared between them.
If you want to optimize the amount of data transfer, you can put these sub-workspaces in the session directory,
and only put data to share in the workspace that gets copied. To keep things simple, we'll stick to
the existing structure.
Edit the template from having a single step in the steps
list:
steps:
- name: GaussianSplattingPipeline
script:
...
to have three sequential steps connected by dependencies, named for the scripts they run:
steps:
- name: ExtractVideoFrames
script:
...
- name: SolveStructureFromMotion
dependencies:
- dependsOn: ExtractVideoFrames
script:
...
- name: TrainGaussianSplatting
dependencies:
- dependsOn: SolveStructureFromMotion
script:
...
For each step, copy the whole structure but keep a subset of the embedded script. Here's the SfM step, the rest we'll leave as an exercise for the reader.
- name: SolveStructureFromMotion
dependencies:
- dependsOn: ExtractVideoFrames
script:
actions:
onRun:
command: bash
args: ['{{Task.File.Run}}']
embeddedFiles:
- name: Run
filename: solve_sfm.sh
type: TEXT
data: |
#!/bin/env bash
set -xeuo pipefail
cd "$WORKSPACE_DIR"
# 2. Solve Structure-from-Motion, saving the pinhole model and undistorted images
solve_glomap_sfm.sh \
./source_images \
./sfm_workspace \
./sparse \
./images
echo "openjd_status: Finished solving SfM"
hostRequirements:
attributes:
- name: attr.worker.os.family
anyOf:
- linux
amounts:
- name: amount.worker.gpu
min: 1
When you run this job, it will run slower due to scheduling and data transfer overhead between the steps.
On the other hand, it's organized to schedule on different fleets by editing the hostRequirements
of
each step, or add task parallelism
to a step that can be split up into many identical tasks.
The right job structure will depend on how you use the job. Here are some more directions you might take it:
- One job to extract frames and solve Structure-from-Motion. A second job that trains the Gaussian Splatting. With this structure, you can preprocess into camera poses once, after which you iteratively try different variations of Gaussian Splatting parameters.
- A job to train Gaussian Splatting that optionally starts from an existing workspace, and continues training it. This way, you can run one job to train initial lower quality versions, and then select the best candidates for continued training after visually inspecting them.
- Add steps between frame extraction and Structure-from-Motion that removes blurry images from the workspace, calculates masks to remove the background or people, etc.