Skip to content

ITEP-84336: Fix incorrect camera pose for models generated with VGGT#1139

Open
daddo-intel wants to merge 45 commits intomainfrom
fix/ITEP-84336-incorrect-cam-pose-vggt
Open

ITEP-84336: Fix incorrect camera pose for models generated with VGGT#1139
daddo-intel wants to merge 45 commits intomainfrom
fix/ITEP-84336-incorrect-cam-pose-vggt

Conversation

@daddo-intel
Copy link
Copy Markdown
Contributor

@daddo-intel daddo-intel commented Mar 5, 2026

📝 Description

VGGT outputs camera poses and depth in scale-ambiguous units, which cased:

  1. Incorrect mesh dimenstions
  2. incorrect pixels_per_meter in renderTopView
  3. Inconsistent scale compared to MapAnything

This PR implements automatic metric scale for VGGT using known camera poses provided to` _processOutputs() and applies scale factor to:

  1. Camera translations
  2. world_points_from_depth
  3. world_points

✨ Type of Change

Select the type of change your PR introduces:

  • 🐞 Bug fix – Non-breaking change which fixes an issue
  • 🚀 New feature – Non-breaking change which adds functionality
  • 🔨 Refactor – Non-breaking change which refactors the code base
  • 💥 Breaking change – Changes that break existing functionality
  • 📚 Documentation update
  • 🔒 Security update
  • 🧪 Tests
  • 🚂 CI

🧪 Testing Scenarios

Describe how the changes were tested and how reviewers can test them too:

  • ✅ Tested manually
  • 🤖 Ran automated end-to-end tests

✅ Checklist

Before submitting the PR, ensure the following:

  • 🔍 PR title is clear and descriptive
  • 📝 For internal contributors: If applicable, include the JIRA ticket number (e.g., ITEP-123456) in the PR title. Do not include full URLs
  • 💬 I have commented my code, especially in hard-to-understand areas
  • 📄 I have made corresponding changes to the documentation
  • ✅ I have added tests that prove my fix is effective or my feature works

@daddo-intel daddo-intel marked this pull request as ready for review March 5, 2026 00:44
@daddo-intel daddo-intel requested a review from saratpoluri March 5, 2026 12:13
Comment thread manager/src/manager/mesh_generator.py
Comment thread mapping/src/api_service_base.py
Comment thread mapping/src/vggt_model.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements automatic metric scale correction for VGGT reconstructions by propagating known camera poses/locations from the Manager → Mapping API → VGGT model, then scaling predicted camera translations and reconstructed world points to match real-world units.

Changes:

  • Extend the mapping request pipeline to include per-image camera_location (pose) metadata from Manager to the mapping service.
  • Compute a metric scale factor inside VGGTModel._processOutputs() and apply it to camera translations and world points (and depth).
  • Adjust the example mapping client’s timeouts/polling behavior for longer-running reconstructions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
mapping/tools/client_example.py Increases request timeout and simplifies polling behavior for async reconstructions.
mapping/src/vggt_model.py Adds metric scaling logic for VGGT outputs and updates preprocessing/intrinsics/camera pose packaging.
mapping/src/api_service_base.py Parses optional camera_locations form data and attaches it to per-image inference payloads.
manager/src/django/mesh_generator.py Extracts camera pose via CamSerializer and uploads it alongside images to the mapping service.

Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py
Comment thread mapping/src/vggt_model.py Outdated
Comment thread manager/src/django/mesh_generator.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread mapping/src/vggt_model.py Outdated
Comment on lines +596 to +600
if baseline_m > 0 and len(camera_to_world_list) >= 2:
b_units = self._baseline_units(camera_to_world_list[0], camera_to_world_list[1])
if b_units > 1e-6:
s = baseline_m / b_units
log.info(f"Scaling VGGT outputs by s={s:.6f} (baseline {baseline_m:.6f}m / {b_units:.6f} units)")
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric scale factor is computed using baseline_m (median over all provided camera_location translations) but b_units is computed only from the first two predicted camera poses. If those two frames are not representative of the median baseline (or if ordering differs), this can produce an incorrect scale factor and reintroduce the same “wrong scale” behavior.

Consider computing the predicted baseline in model units using the same robust statistic as baseline_m (e.g., median of pairwise camera-center distances across all valid pose pairs, aligned by index), then set s = median_baseline_m / median_baseline_units (or compute a median of per-pair ratios).

Copilot uses AI. Check for mistakes.
Comment thread manager/src/manager/mesh_generator.py
Comment thread manager/src/django/mesh_generator.py
Comment thread mapping/src/api_service_base.py
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Comment thread mapping/src/vggt_model.py Outdated
Change single letter variable 's' to scale
@saratpoluri saratpoluri force-pushed the fix/ITEP-84336-incorrect-cam-pose-vggt branch from f86aa8e to 5e3d4e5 Compare April 21, 2026 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants