-
Notifications
You must be signed in to change notification settings - Fork 189
Open
Description
Bug: mask_paths array misaligned with image_paths when using train/val split
Summary
When using image masks with the COLMAP dataset loader, self.mask_paths[idx] retrieves the wrong mask because mask_paths is built from all images while image_paths is filtered by train/val split.
Original Runtime Error
When mask dimensions don't match image dimensions (multi-camera setup):
RuntimeError: shape '[1, 1246, 2171, 1]' is invalid for input of size 2631636
Minimal Reproduction
(showcasing the path mismatch for images of same size, so no runtime error)
Run from 3dgrut repo root:
import os, sys, shutil, struct
import numpy as np
from PIL import Image
TEST_DIR = "/tmp/3dgrut_mask_bug"
shutil.rmtree(TEST_DIR, ignore_errors=True)
os.makedirs(f"{TEST_DIR}/images")
os.makedirs(f"{TEST_DIR}/sparse/0")
# 10 images + masks
for i in range(10):
Image.fromarray(np.random.randint(0,255,(100,100,3),dtype=np.uint8)).save(f"{TEST_DIR}/images/img_{i:02d}.png")
Image.fromarray(np.ones((100,100),dtype=np.uint8)*255).save(f"{TEST_DIR}/images/img_{i:02d}_mask.png")
# cameras.bin - 1 PINHOLE camera
with open(f"{TEST_DIR}/sparse/0/cameras.bin","wb") as f:
f.write(struct.pack('<Q',1))
f.write(struct.pack('<Ii',1,1))
f.write(struct.pack('<QQ',100,100))
f.write(struct.pack('<4d',50,50,50,50))
# images.bin - 10 images
with open(f"{TEST_DIR}/sparse/0/images.bin","wb") as f:
f.write(struct.pack('<Q',10))
for i in range(10):
f.write(struct.pack('<I',i+1))
f.write(struct.pack('<4d',1,0,0,0))
f.write(struct.pack('<3d',0,0,i))
f.write(struct.pack('<I',1))
f.write(f"img_{i:02d}.png\0".encode())
f.write(struct.pack('<Q',0))
# points3D.bin - empty
with open(f"{TEST_DIR}/sparse/0/points3D.bin","wb") as f:
f.write(struct.pack('<Q',0))
# Demonstrate bug
sys.path.insert(0,'.')
from threedgrut.datasets.dataset_colmap import ColmapDataset
ds = ColmapDataset(path=TEST_DIR, split="train")
print(f"image_paths: {len(ds.image_paths)}, mask_paths: {len(ds.mask_paths)}")
for i in range(len(ds.image_paths)):
img = os.path.basename(ds.image_paths[i])
msk = os.path.basename(ds.mask_paths[i])
expected = img.replace('.png','_mask.png')
if msk != expected:
print(f"BUG idx={i}: image={img}, mask_paths[i]={msk}, expected={expected}")Output
image_paths: 8, mask_paths: 10
BUG idx=0: image=img_01.png, mask_paths[i]=img_00_mask.png, expected=img_01_mask.png
BUG idx=1: image=img_02.png, mask_paths[i]=img_01_mask.png, expected=img_02_mask.png
BUG idx=2: image=img_03.png, mask_paths[i]=img_02_mask.png, expected=img_03_mask.png
...
BUG CONFIRMED: 8 mismatches
Expected Root Cause
In threedgrut/datasets/dataset_colmap.py:
mask_pathsarray is built from ALL images (line ~260)image_pathsarray is later filtered by train/val split__getitem__usesself.mask_paths[idx]which indexes into the unfiltered array
Suggested Fix 1: Derive mask path in __getitem__
In __getitem__ (~line 362), derive mask path from image path instead of using array index:
# Current (broken):
if os.path.exists(mask_path := self.mask_paths[idx]):
mask = torch.from_numpy(np.array(Image.open(mask_path).convert("L"))).reshape(1, actual_h, actual_w, 1)
# Fixed:
mask_path = os.path.splitext(self.image_paths[idx])[0] + "_mask.png"
if os.path.exists(mask_path):
mask = torch.from_numpy(np.array(Image.open(mask_path).convert("L"))).reshape(1, actual_h, actual_w, 1)Suggested Fix 2: Filter mask_paths alongside image_paths
Wherever image_paths is filtered by train/val split, apply the same filtering to mask_paths:
# When filtering image_paths by split indices:
self.image_paths = self.image_paths[split_indices]
self.mask_paths = self.mask_paths[split_indices] # Add this lineEnvironment
- 3DGRUT commit: 16b7b19 (main branch)
- Python: 3.11.14
- PyTorch: 2.1.2
- CUDA: 11.8
- GPU: H100 80GB
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels