Skip to content

Commit a8e0cc0

Browse files
committed
add
1 parent fe928bb commit a8e0cc0

File tree

2 files changed

+498
-0
lines changed

2 files changed

+498
-0
lines changed

tools/usage.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Tools Usage Guide
2+
3+
This document describes utility tools provided with InstantSfM for preprocessing and auxiliary tasks.
4+
5+
## Video Depth Anything
6+
7+
The `video_depth_anything.py` script generates metric depth maps from image sequences using the [Video Depth Anything](https://github.com/DepthAnything/Video-Depth-Anything) model. InstantSfM currently supports only metric depth, so make sure to use the metric depth models. Note that Video Depth Anything requires the input images to be a continuous image sequence (e.g., frames extracted from a video).
8+
9+
### Setup
10+
11+
You can follow the [official instructions](https://github.com/DepthAnything/Video-Depth-Anything) to set up Video Depth Anything, or follow the steps below:
12+
13+
**1. Clone Video Depth Anything**
14+
15+
Clone the Video Depth Anything repository into the `external/` directory:
16+
```bash
17+
cd external
18+
git clone https://github.com/DepthAnything/Video-Depth-Anything.git
19+
cd Video-Depth-Anything
20+
```
21+
22+
**2. Install Dependencies**
23+
24+
Install the required Python packages:
25+
```bash
26+
conda create -n vda python=3.10
27+
conda activate vda
28+
pip install -r requirements.txt
29+
```
30+
Then install pytorch and xformers as per your CUDA version. For example, for CUDA 12.1:
31+
```bash
32+
pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
33+
```
34+
35+
36+
**3. Download Model Checkpoints**
37+
38+
Create a `checkpoints/` directory and download the pretrained weights:
39+
```bash
40+
mkdir -p checkpoints
41+
cd checkpoints
42+
```
43+
44+
For **metric depth** (recommended):
45+
```bash
46+
# Large model (best quality)
47+
wget https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth
48+
49+
# Base model (balanced)
50+
wget https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/resolve/main/metric_video_depth_anything_vitb.pth
51+
52+
# Small model (fastest)
53+
wget https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/resolve/main/metric_video_depth_anything_vits.pth
54+
```
55+
56+
### Usage
57+
58+
**Basic Command**
59+
60+
Process a dataset directory containing images:
61+
```bash
62+
python tools/video_depth_anything.py \
63+
--data_path /path/to/dataset \
64+
--encoder vitl
65+
```
66+
67+
The script will:
68+
1. Search for image folders recursively in `data_path`
69+
2. Process each folder containing images
70+
3. Save depth maps to `data_path/depth_vda/` by default
71+
72+
**Directory Structure**
73+
74+
The input directory should have exactly the same structure as required by InstantSfM. Use the same `data_path` as for InstantSfM's processing. Output depth maps will be saved in a subdirectory named `depth_vda/`.

0 commit comments

Comments
 (0)