Skip to content

A question about input geometric parameters #150

@FrancescoFornasa

Description

@FrancescoFornasa

Dear Sirs,
I have a question about the expected outcome when passing geometric parameters as input during training.
In my use case, I have the exact extrinsics (Rt) and intrinsics (K) parameters of each camera in my setup.

I would have expected the model to use these inputs as "fixed ground truth references" when predicting the other outputs (such as depth maps) during inference.

However, it would appear to me that these input parameters are considered as an "optimization initialization" as they are modified during the inference process.

Am I missing something obvious here?
Thank you in advance for your kind help and amazing work!

Here is a relevant snippet for my code:

from mapanything.utils.image import load_images

images = [path1, path2, path3] 
views = load_images(images)

views[0].update({"camera_poses":extrinsics_0_torch})
views[1].update({"camera_poses":extrinsics_1_torch})
views[2].update({"camera_poses":extrinsics_2_torch})

views[0].update({"intrinsics":intrinsics_0_torch})
views[1].update({"intrinsics":intrinsics_1_torch})
views[2].update({"intrinsics":intrinsics_2_torch})

predictions = model.infer(
    views,                            # Input views
    memory_efficient_inference=True,  # Trades off speed for more views (up to 2000 views on 140 GB). Trade off is negligible - see profiling section
    minibatch_size=None,              # Minibatch size for memory-efficient inference (use 1 for smallest GPU memory consumption). Default is dynamic computation based on available GPU memory.
    use_amp=True,                     # Use mixed precision inference (recommended)
    amp_dtype="bf16",                 # bf16 inference (recommended; falls back to fp16 if bf16 not supported)
    apply_mask=True,                  # Apply masking to dense geometry outputs
    mask_edges=True,                  # Remove edge artifacts by using normals and depth
    apply_confidence_mask=True,       # Filter low-confidence regions
    confidence_percentile=10,         # Remove bottom 10 percentile confidence pixels
    use_multiview_confidence=False,   # Enable multi-view depth consistency based confidence in place of learning-based one
)

all_predicted_camera_poses = []
all_predicted_intrinsics = []

for i, pred in enumerate(predictions):
    intrinsics = pred["intrinsics"]           # Recovered pinhole camera intrinsics (B, 3, 3)
    camera_poses = pred["camera_poses"]       # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world poses in world frame (B, 4, 4)
    all_predicted_camera_poses.append(camera_poses.detach().cpu().numpy())
    all_predicted_intrinsics.append(intrinsics.detach().cpu().numpy())

import pdb; pdb.set_trace()

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions