- Data Preparation Guide for ScanNet and ScanNet++
This document provides instructions for preparing ScanNet and ScanNet++ datasets for training and evaluation.
- Python 3.7+
- Sufficient storage space (>1TB recommended)
- Required Python packages:
pip install -r data_process/requirements.txt
- Visit the official ScanNet repository
- Fill out the Terms of Use agreement
- Send the signed agreement to [email protected]
- You will receive download instructions and access credentials via email
Example download command (after receiving credentials):
# Download ScanNet v2 data
python download-scannet.py -o ./data/scannet --type .sens
We provide a parallel processing script to efficiently extract data from .sens files:
# Run the parallel export script
python data_process/scannet/export_data/export.py \
--input_dir ./data/scannet \
--output_dir ./data/scannet_extracted \
--num_workers 8
The export script provides the following features:
- Parallel processing using multiple CPU cores
- Automatic skipping of already processed scenes
- Progress tracking with tqdm
Arguments:
--input_dir
: Directory containing the ScanNet dataset--output_dir
: Directory to save extracted data--num_workers
: Number of parallel workers (default: half of CPU cores)
Output structure for each scene:
data/scannet_extracted/
├── scene0000_00/
│ ├── color/ # RGB images in jpg format
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ └── ...
│ ├── depth/ # Depth images in png format (16-bit, depth shift 1000)
│ │ ├── 0.png
│ │ ├── 1.png
│ │ └── ...
│ ├── pose/ # Camera poses (4x4 matrices)
│ │ ├── 0.txt
│ │ ├── 1.txt
│ │ └── ...
│ └── intrinsic/ # Camera parameters
│ ├── intrinsic_color.txt # Color camera intrinsics
│ ├── intrinsic_depth.txt # Depth camera intrinsics
│ ├── extrinsic_color.txt # Color camera extrinsics
│ └── extrinsic_depth.txt # Depth camera extrinsics
├── scene0000_01/
│ ├── color/
│ ├── depth/
│ ├── pose/
│ └── intrinsic/
└── ...
# Process raw data
python -m data_process.scannet.scannet_processor \
--root_dir data/scannet_extracted \
--save_dir data/scannet_processed \
--device cuda \
--num_workers 8
Arguments:
-
--root_dir
: Path to the extracted ScanNet data directory (default: "data/scannet_extracted")- Should contain the extracted .sens files organized by scene
- Each scene should have color/, depth/, pose/, and intrinsic/ subdirectories
-
--save_dir
: Path where processed data will be saved (default: "data/scannet_processed")- Will create if directory doesn't exist
- Processed data will be organized by scene with standardized format
-
--device
: Computing device to use (default: "cuda")- "cuda": Use GPU acceleration (recommended)
- "cpu": Use CPU only (slower)
-
--num_workers
: Number of parallel processing workers (default: 8)- Higher values may speed up processing but use more memory
- Recommended: Set to number of CPU cores or less
Note: Ensure sufficient disk space in save_dir (>500GB recommended for full dataset)
After processing, the ScanNet data will be organized in the following structure:
data/scannet_processed/
├── scene0000_00/
│ ├── color/ # Directory containing RGB images
│ │ ├── 000000.png
│ │ └── ... # Additional RGB images
│ ├── depth/ # Directory containing Depth maps
│ │ ├── 000000.png
│ │ └── ... # Additional Depth maps
│ └── pose/ # Directory containing Camera poses
│ ├── 000000.npz
│ └── ... # Additional camera pose files
├── scene0000_01/
└── ... # Additional scenes
- Visit the official ScanNet++ repository
- Fill out the Terms of Use agreement
- You will receive a download script and token (valid for 14 days)
To download the dataset:
- Navigate to the download script directory
- Edit
download_scannetpp.yml
configuration file:- Set your token
- Configure download directory
- Install dependencies:
pip install -r requirements.txt
- Run the download script:
python download_scannetpp.py download_scannetpp.yml
For DSLR format data processing, follow these steps:
- Create and setup rendering environment:
# Create new conda environment
conda create -n renderpy python=3.9
conda activate renderpy
# Install Python dependencies
pip install imageio numpy tqdm opencv-python pyyaml munch
# Clone and build renderpy
cd data_process/scannetpp
git clone --recursive https://github.com/liu115/renderpy
cd renderpy
# Install system dependencies
sudo apt-get install build-essential cmake git
sudo apt-get install libgl1-mesa-dev libglu1-mesa-dev libxrandr-dev libxext-dev
sudo apt-get install libopencv-dev
sudo apt-get install libgflags-dev libboost-all-dev
# Install renderpy
pip install . && cd ..
- Configure rendering:
- Edit
data_process/scannetpp/common/configs/render.yml
- Update
data_root
andoutput_dir
paths
- Run rendering process:
python -m common.render common/configs/render.yml
The processed ScanNet++ data will be organized as follows:
data/scannetpp_render/
└── {scene_id}/ # e.g., fb564c935d
└── {device}/ # device can be 'dslr' or 'iphone'
├── camera/ # Camera parameter files
│ ├── {frame_id}.npz # Contains intrinsic and extrinsic matrices
│ └── ...
├── render_depth/ # Rendered depth maps
│ ├── {frame_id}.png # 16-bit depth maps (depth * 1000)
│ └── ...
├── rgb_resized_undistorted/ # Processed RGB images
│ ├── {frame_id}.JPG # Undistorted and resized color images
│ └── ...
└── mask_resized_undistorted/ # Processed mask images
├── {frame_id}.png # Binary masks (0 or 255)
└── ...
Each directory contains:
camera/
: Camera parameter files in .npz format, containing:intrinsic
: 3x3 camera intrinsic matrixextrinsic
: 4x4 camera-to-world transformation matrix
render_depth/
: Rendered depth maps stored as 16-bit PNG files (depth values * 1000)rgb_resized_undistorted/
: Undistorted and resized RGB imagesmask_resized_undistorted/
: Undistorted and resized binary mask images (255 for valid pixels, 0 for invalid)
# Download and extract test dataset
wget https://huggingface.co/datasets/Journey9ni/LSM/resolve/main/scannet_test.tar
tar -xf scannet_test.tar -C ./data/ # Extract to the data directory
The test dataset is expected to have the following structure:
data/scannet_test/
└── {scene_id}/
├── depth/ # Depth maps
├── images/ # RGB images
├── labels/ # Semantic labels
├── selected_seqs_test.json # Test sequence parameters
└── selected_seqs_train.json # Train sequence parameters
The test set was curated using the following process:
- Initial Selection: The last 50 scenes from the alphabetically sorted list of original ScanNet scans were initially selected.
- Frame Sampling: 30 frames were sampled at regular intervals from each selected scene.
- Pose Validation: Each frame's pose data was checked for NaN values (due to errors in the original ScanNet dataset). Scenes containing frames with invalid poses were excluded (7 scenes removed).
- Compatibility Check: Scenes that caused errors during testing with NeRF-DFF and Feature-3DGS were further filtered out.
- Final Set: This process resulted in a final test set of 40 scenes.
We use a predefined set of common indoor categories: ['wall', 'floor', 'ceiling', 'chair', 'table', 'sofa', 'bed', 'other'], instead of the 20 categories used by ScanNetV2.
The testing process relies on the TestDataset
class in large_spatial_model/datasets/testdata.py
, initialized with split='test'
and is_training=False
.
- View Selection: The dataset selects test views based on
llff_hold
andtest_ids
parameters for each scene. Typically, frames whose index modulollff_hold
falls withintest_ids
are chosen as the core test frames (target_view
). - View Grouping: For each selected
target_view
, the dataset groups it with its immediate predecessor (source_view1
) and successor (source_view2
), forming a tuple of view indices:(source_view2, target_view, source_view1)
. The test set comprises a series of these(Scene ID, View Indices Tuple)
pairs. - Data Loading: When iterating through the dataset during testing:
- The script loads RGB images (
.jpg
), depth maps (.png
), semantic label maps (.png
), and camera parameters (intrinsics, extrinsics from.npz
) for each view index in the tuple. - Preprocessing steps include validity checks (e.g., for NaN in camera poses) and image cropping/resizing.
- The
map_func
maps original ScanNet semantic labels to the simplified category set defined above. - This yields a dictionary for each view containing image, depth, pose, intrinsics, processed label map, etc.
- The script loads RGB images (
- Model Inference and Evaluation:
- The model takes the
source_view1
andsource_view2
data as input to infer the parameters (e.g., Gaussian parameters for 3D Gaussian Splatting). - Using these inferred parameters and the
target_view
's camera pose/intrinsics, the model renders a semantic label map for thetarget_view
. - This rendered semantic map is then compared against the ground truth semantic label map for the
target_view
from the original ScanNet dataset to evaluate the model's performance.
- The model takes the