ITEP-84336: Fix incorrect camera pose for models generated with VGGT#1139
ITEP-84336: Fix incorrect camera pose for models generated with VGGT#1139daddo-intel wants to merge 45 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Implements automatic metric scale correction for VGGT reconstructions by propagating known camera poses/locations from the Manager → Mapping API → VGGT model, then scaling predicted camera translations and reconstructed world points to match real-world units.
Changes:
- Extend the mapping request pipeline to include per-image
camera_location(pose) metadata from Manager to the mapping service. - Compute a metric scale factor inside
VGGTModel._processOutputs()and apply it to camera translations and world points (and depth). - Adjust the example mapping client’s timeouts/polling behavior for longer-running reconstructions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| mapping/tools/client_example.py | Increases request timeout and simplifies polling behavior for async reconstructions. |
| mapping/src/vggt_model.py | Adds metric scaling logic for VGGT outputs and updates preprocessing/intrinsics/camera pose packaging. |
| mapping/src/api_service_base.py | Parses optional camera_locations form data and attaches it to per-image inference payloads. |
| manager/src/django/mesh_generator.py | Extracts camera pose via CamSerializer and uploads it alongside images to the mapping service. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| if baseline_m > 0 and len(camera_to_world_list) >= 2: | ||
| b_units = self._baseline_units(camera_to_world_list[0], camera_to_world_list[1]) | ||
| if b_units > 1e-6: | ||
| s = baseline_m / b_units | ||
| log.info(f"Scaling VGGT outputs by s={s:.6f} (baseline {baseline_m:.6f}m / {b_units:.6f} units)") |
There was a problem hiding this comment.
The metric scale factor is computed using baseline_m (median over all provided camera_location translations) but b_units is computed only from the first two predicted camera poses. If those two frames are not representative of the median baseline (or if ordering differs), this can produce an incorrect scale factor and reintroduce the same “wrong scale” behavior.
Consider computing the predicted baseline in model units using the same robust statistic as baseline_m (e.g., median of pairwise camera-center distances across all valid pose pairs, aligned by index), then set s = median_baseline_m / median_baseline_units (or compute a median of per-pair ratios).
Change single letter variable 's' to scale
f86aa8e to
5e3d4e5
Compare
📝 Description
VGGT outputs camera poses and depth in scale-ambiguous units, which cased:
This PR implements automatic metric scale for VGGT using known camera poses provided to` _processOutputs() and applies scale factor to:
Camera translationsworld_points_from_depthworld_points✨ Type of Change
Select the type of change your PR introduces:
🧪 Testing Scenarios
Describe how the changes were tested and how reviewers can test them too:
✅ Checklist
Before submitting the PR, ensure the following: