An unofficial playground for Meta's SAM3D Body (DINOv3) with promptable SAM3 masks and live Rerun visualization. Uses Rerun for 3D inspection, Gradio for the UI, and Pixi for one-command setup.
Make sure you have the Pixi package manager installed.
TL;DR install Pixi:
curl -fsSL https://pixi.sh/install.sh | shRestart your shell so the new pixi binary is on PATH.
This is Linux only with an NVIDIA GPU.
The SAM3 and SAM3D Body checkpoints are gated on Hugging Face—request access for both facebook/sam-3d-body-dinov3 and facebook/sam3, then authenticate either by setting HF_TOKEN=<your token> or running huggingface-cli login before the first download (see Meta's install notes).
First run will download HF checkpoints for SAM3, SAM3D Body, and the relative-depth model.
git clone https://github.com/rerun-io/sam3d-body-rerun.git
cd sam3d-body-rerun
pixi run appAll commands can be listed with pixi task list.
pixi run appOpens the Gradio UI with an embedded streaming Rerun viewer. Try the bundled samples in data/example-data or upload your own RGB image; toggle “Log relative depth” to stream predicted depth.
From a dev shell (for tyro + dev deps):
pixi run cli
OR
pixi shell -e dev
python tool/demo.py --helpRun on a folder of images and configure Rerun output/recordings via the CLI flags.
If you just want SAM3 masks without 3D reconstruction:
pixi run -e dev python tool/gradio_sam3.pyProcess individual videos with SAM3 text-prompted segmentation. Three modes available:
Batch Mode (small videos <4GB, best quality):
pixi run video-demo --video-path path/to/video.mp4 --prompt "person"Chunk Mode (large videos, memory-efficient with overlapping chunks):
pixi run video-chunk-demo --video-path path/to/video.mp4 --prompt "person"Streaming Mode (constant memory, frame-by-frame):
pixi run video-stream-demo --video-path path/to/video.mp4 --prompt "person"Use --help with any command to see all options.
Process multiview HoCap video sequences with SAM3 segmentation and TSDF mesh fusion:
pixi run mv-video-demoDownloads sample data (~1.7GB) on first run and processes 100 frames across 8 cameras, visualizing segmentation overlays and a fused 3D mesh in Rerun. Requires ~3GB VRAM.
Fuse per-view body predictions into a single globally-consistent 3D mesh using differentiable MHR forward kinematics:
# HoCap dataset
pixi run -e dev python tool/demo_mv_body.py hocap --root-directory data/sample
# RRD file (from ExoEgo pipeline)
pixi run -e dev python tool/demo_mv_body.py rrd --rrd-path path/to/episode.rrdWhat it does:
- Runs SAM3 + SAM3DBody on each camera view independently
- Triangulates 2D keypoints across views for 3D supervision
- Optimizes MHR pose parameters via L1 multiview reprojection loss
- Validates alignment: MHR mesh vs triangulated keypoints
Performance (4 cameras, 50 frames):
| Metric | Value |
|---|---|
| MHR World Error | 5.0px (matches triangulated) |
| 3D Alignment | 0.01m (~1cm) |
| Timing | 81% inference, 19% optimization |
| Throughput | 0.4 FPS end-to-end |
Thanks to the original projects that make this demo possible:
- facebook/sam-3d-body-dinov3 — SAM3D Body checkpoints and assets.
- facebook/sam3 — promptable concept segmentation.
- Relative depth/FOV from
MogeV1Predictorin monopriors. - Built with Rerun, Gradio, and Pixi.
Dual licensed under Apache 2.0 and MIT for the code in this repository; upstream models/assets retain their original licenses (see LICENSE-APACHE and LICENSE-MIT for this repo).

