-
Notifications
You must be signed in to change notification settings - Fork 220
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Dear Sirs,
I have a question about the expected outcome when passing geometric parameters as input during training.
In my use case, I have the exact extrinsics (Rt) and intrinsics (K) parameters of each camera in my setup.
I would have expected the model to use these inputs as "fixed ground truth references" when predicting the other outputs (such as depth maps) during inference.
However, it would appear to me that these input parameters are considered as an "optimization initialization" as they are modified during the inference process.
Am I missing something obvious here?
Thank you in advance for your kind help and amazing work!
Here is a relevant snippet for my code:
from mapanything.utils.image import load_images
images = [path1, path2, path3]
views = load_images(images)
views[0].update({"camera_poses":extrinsics_0_torch})
views[1].update({"camera_poses":extrinsics_1_torch})
views[2].update({"camera_poses":extrinsics_2_torch})
views[0].update({"intrinsics":intrinsics_0_torch})
views[1].update({"intrinsics":intrinsics_1_torch})
views[2].update({"intrinsics":intrinsics_2_torch})
predictions = model.infer(
views, # Input views
memory_efficient_inference=True, # Trades off speed for more views (up to 2000 views on 140 GB). Trade off is negligible - see profiling section
minibatch_size=None, # Minibatch size for memory-efficient inference (use 1 for smallest GPU memory consumption). Default is dynamic computation based on available GPU memory.
use_amp=True, # Use mixed precision inference (recommended)
amp_dtype="bf16", # bf16 inference (recommended; falls back to fp16 if bf16 not supported)
apply_mask=True, # Apply masking to dense geometry outputs
mask_edges=True, # Remove edge artifacts by using normals and depth
apply_confidence_mask=True, # Filter low-confidence regions
confidence_percentile=10, # Remove bottom 10 percentile confidence pixels
use_multiview_confidence=False, # Enable multi-view depth consistency based confidence in place of learning-based one
)
all_predicted_camera_poses = []
all_predicted_intrinsics = []
for i, pred in enumerate(predictions):
intrinsics = pred["intrinsics"] # Recovered pinhole camera intrinsics (B, 3, 3)
camera_poses = pred["camera_poses"] # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world poses in world frame (B, 4, 4)
all_predicted_camera_poses.append(camera_poses.detach().cpu().numpy())
all_predicted_intrinsics.append(intrinsics.detach().cpu().numpy())
import pdb; pdb.set_trace()
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested