|
| 1 | +# Tools Usage Guide |
| 2 | + |
| 3 | +This document describes utility tools provided with InstantSfM for preprocessing and auxiliary tasks. |
| 4 | + |
| 5 | +## Video Depth Anything |
| 6 | + |
| 7 | +The `video_depth_anything.py` script generates metric depth maps from image sequences using the [Video Depth Anything](https://github.com/DepthAnything/Video-Depth-Anything) model. InstantSfM currently supports only metric depth, so make sure to use the metric depth models. Note that Video Depth Anything requires the input images to be a continuous image sequence (e.g., frames extracted from a video). |
| 8 | + |
| 9 | +### Setup |
| 10 | + |
| 11 | +You can follow the [official instructions](https://github.com/DepthAnything/Video-Depth-Anything) to set up Video Depth Anything, or follow the steps below: |
| 12 | + |
| 13 | +**1. Clone Video Depth Anything** |
| 14 | + |
| 15 | +Clone the Video Depth Anything repository into the `external/` directory: |
| 16 | +```bash |
| 17 | +cd external |
| 18 | +git clone https://github.com/DepthAnything/Video-Depth-Anything.git |
| 19 | +cd Video-Depth-Anything |
| 20 | +``` |
| 21 | + |
| 22 | +**2. Install Dependencies** |
| 23 | + |
| 24 | +Install the required Python packages: |
| 25 | +```bash |
| 26 | +conda create -n vda python=3.10 |
| 27 | +conda activate vda |
| 28 | +pip install -r requirements.txt |
| 29 | +``` |
| 30 | +Then install pytorch and xformers as per your CUDA version. For example, for CUDA 12.1: |
| 31 | +```bash |
| 32 | +pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121 |
| 33 | +``` |
| 34 | + |
| 35 | + |
| 36 | +**3. Download Model Checkpoints** |
| 37 | + |
| 38 | +Create a `checkpoints/` directory and download the pretrained weights: |
| 39 | +```bash |
| 40 | +mkdir -p checkpoints |
| 41 | +cd checkpoints |
| 42 | +``` |
| 43 | + |
| 44 | +For **metric depth** (recommended): |
| 45 | +```bash |
| 46 | +# Large model (best quality) |
| 47 | +wget https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth |
| 48 | + |
| 49 | +# Base model (balanced) |
| 50 | +wget https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/resolve/main/metric_video_depth_anything_vitb.pth |
| 51 | + |
| 52 | +# Small model (fastest) |
| 53 | +wget https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/resolve/main/metric_video_depth_anything_vits.pth |
| 54 | +``` |
| 55 | + |
| 56 | +### Usage |
| 57 | + |
| 58 | +**Basic Command** |
| 59 | + |
| 60 | +Process a dataset directory containing images: |
| 61 | +```bash |
| 62 | +python tools/video_depth_anything.py \ |
| 63 | + --data_path /path/to/dataset \ |
| 64 | + --encoder vitl |
| 65 | +``` |
| 66 | + |
| 67 | +The script will: |
| 68 | +1. Search for image folders recursively in `data_path` |
| 69 | +2. Process each folder containing images |
| 70 | +3. Save depth maps to `data_path/depth_vda/` by default |
| 71 | + |
| 72 | +**Directory Structure** |
| 73 | + |
| 74 | +The input directory should have exactly the same structure as required by InstantSfM. Use the same `data_path` as for InstantSfM's processing. Output depth maps will be saved in a subdirectory named `depth_vda/`. |
0 commit comments