|
| 1 | +<!-- SPDX-FileCopyrightText: (C) 2026 Intel Corporation --> |
| 2 | +<!-- SPDX-License-Identifier: Apache-2.0 --> |
| 3 | + |
| 4 | +# Markerless Camera Calibration Internals |
| 5 | + |
| 6 | +The markerless calibration path uses a Hierarchical Localization (HLoc) workflow with two stages: |
| 7 | + |
| 8 | +1. **Global retrieval** with **NetVLAD** to find candidate database images. |
| 9 | +2. **Local matching** (sparse or dense) followed by geometric pose solving. |
| 10 | + |
| 11 | +## How NetVLAD is used |
| 12 | + |
| 13 | +- During scene registration, the service extracts global descriptors for dataset images and stores them in an HDF5 file (for example, `global-feats-netvlad.h5`). |
| 14 | +- During camera localization, the service extracts a NetVLAD descriptor for the query frame and uses `pairs_from_retrieval` to retrieve top-$K$ candidates (`number_of_localizations`, default `50`) from the registered descriptor database. |
| 15 | +- The retrieved image pairs define the shortlist for local feature matching and pose estimation. |
| 16 | + |
| 17 | +## How quadtree attention is used |
| 18 | + |
| 19 | +- SceneScape integrates a custom HLoc matcher based on **QTA-LoFTR** (`qta_loftr.py`) that loads the QuadTreeAttention implementation. |
| 20 | +- In this matcher, LoFTR coarse matching is configured with `BLOCK_TYPE = "quadtree"` (with `ATTN_TYPE = "B"` and tuned `TOPKS`) to reduce attention cost while preserving long-range correspondences. |
| 21 | +- Dense matching is selected when a local feature entry is `"-"`; otherwise, the service runs sparse extraction and matching. |
| 22 | + |
| 23 | +## How HLoc ties the pipeline together |
| 24 | + |
| 25 | +- Registration and localization are orchestrated from the markerless calibration module. |
| 26 | +- HLoc modules used include `extract_features`, `pairs_from_retrieval`, `match_features` / `match_dense`, and `localize_scenescape`. |
| 27 | +- `localize_scenescape.pose_from_cluster` back-projects matched keypoints to 3D using scene depth or mesh, then runs PnP (`pycolmap.absolute_pose_estimation`) to estimate camera pose. |
| 28 | +- The service validates results with two quality gates before returning success: |
| 29 | + - `minimum_number_of_matches` (default `20`) |
| 30 | + - `inlier_threshold` (default `0.5`, computed as $\frac{n_{inliers}}{n_{matches}}$) |
| 31 | + |
| 32 | +## Flow Diagram: Registration and Localization |
| 33 | + |
| 34 | +```mermaid |
| 35 | +flowchart TD |
| 36 | + A[Polycam zip uploaded] --> B[Preprocess dataset and transform to SceneScape layout] |
| 37 | + B --> C[Registration start] |
| 38 | + C --> D[Extract NetVLAD descriptors for DB images] |
| 39 | + D --> E[Save DB global descriptors<br/>global-feats-netvlad.h5] |
| 40 | +
|
| 41 | + E --> F[Calibration request with query frame] |
| 42 | + F --> G[Extract query NetVLAD descriptor] |
| 43 | + G --> H[pairs_from_retrieval selects top-K DB images] |
| 44 | +
|
| 45 | + H --> I{Local matching mode} |
| 46 | + I -->|Sparse| J[Extract local features<br/>example: SIFT] |
| 47 | + J --> K[match_features<br/>example: NN-ratio] |
| 48 | + I -->|Dense| L[match_dense with QTA-LoFTR<br/>coarse block type: quadtree] |
| 49 | +
|
| 50 | + K --> M[localize_scenescape pose_from_cluster] |
| 51 | + L --> M |
| 52 | + M --> N[Back-project DB matches to 3D using depth or mesh] |
| 53 | + N --> O[pycolmap PnP with RANSAC] |
| 54 | + O --> P{Quality gates pass?} |
| 55 | + P -->|No| Q[Return weak or insufficient matches] |
| 56 | + P -->|Yes| R[Return quaternion and translation] |
| 57 | +``` |
| 58 | + |
| 59 | +These values are scene-level configuration inputs from the service model: `global_feature`, `local_feature`, `matcher`, `number_of_localizations`, `minimum_number_of_matches`, and `inlier_threshold`. |
0 commit comments