Add autocalibration flow diagram

saratpoluri · saratpoluri · commit c1101c8d7ba4 · 2026-05-01T11:04:05.000-07:00
diff --git a/docs/user-guide/microservices/auto-calibration/auto-calibration.md b/docs/user-guide/microservices/auto-calibration/auto-calibration.md
@@ -29,6 +29,8 @@ The auto calibration services supports two types of camera calibration methods:
 
 - **Markerless Calibration**: This approach leverages raw RGBD data from a [Polycam](https://poly.cam/) scan to estimate the camera's position in the scene, eliminating the need for physical markers. Check out the detailed guide on how to [Autocalibrate Cameras Using Visual Features](../../how-to-guides/calibrate-cameras/autocalibrate-cameras-using-visual-features.md).
 
+For implementation-level details of markerless calibration using NetVLAD, quadtree attention, and HLoc, see [Markerless Camera Calibration Internals](./markerless-camera-calibration.md).
+
 To deploy the auto calibration service, refer to the [Get Started](./get-started.md) guide. The service supports configuration through specific arguments and flags ([listed below](#configurable-arguments-and-flags)), which default to predefined values unless explicitly modified.
 
 ## Configurable Arguments and Flags
@@ -77,13 +79,15 @@ _Figure 2: Auto Calibration Sequence diagram_
 
 - [Get Started Guide](./get-started.md)
 - [API Reference](./api-reference.md)
+- [Markerless Camera Calibration Internals](./markerless-camera-calibration.md)
 
 <!--hide_directive
 :::{toctree}
 :hidden:
 
 get-started
 api-reference
+markerless-camera-calibration
 
 :::
 hide_directive-->
diff --git a/docs/user-guide/microservices/auto-calibration/markerless-camera-calibration.md b/docs/user-guide/microservices/auto-calibration/markerless-camera-calibration.md
@@ -0,0 +1,59 @@
+<!-- SPDX-FileCopyrightText: (C) 2026 Intel Corporation -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Markerless Camera Calibration Internals
+
+The markerless calibration path uses a Hierarchical Localization (HLoc) workflow with two stages:
+
+1. **Global retrieval** with **NetVLAD** to find candidate database images.
+2. **Local matching** (sparse or dense) followed by geometric pose solving.
+
+## How NetVLAD is used
+
+- During scene registration, the service extracts global descriptors for dataset images and stores them in an HDF5 file (for example, `global-feats-netvlad.h5`).
+- During camera localization, the service extracts a NetVLAD descriptor for the query frame and uses `pairs_from_retrieval` to retrieve top-$K$ candidates (`number_of_localizations`, default `50`) from the registered descriptor database.
+- The retrieved image pairs define the shortlist for local feature matching and pose estimation.
+
+## How quadtree attention is used
+
+- SceneScape integrates a custom HLoc matcher based on **QTA-LoFTR** (`qta_loftr.py`) that loads the QuadTreeAttention implementation.
+- In this matcher, LoFTR coarse matching is configured with `BLOCK_TYPE = "quadtree"` (with `ATTN_TYPE = "B"` and tuned `TOPKS`) to reduce attention cost while preserving long-range correspondences.
+- Dense matching is selected when a local feature entry is `"-"`; otherwise, the service runs sparse extraction and matching.
+
+## How HLoc ties the pipeline together
+
+- Registration and localization are orchestrated from the markerless calibration module.
+- HLoc modules used include `extract_features`, `pairs_from_retrieval`, `match_features` / `match_dense`, and `localize_scenescape`.
+- `localize_scenescape.pose_from_cluster` back-projects matched keypoints to 3D using scene depth or mesh, then runs PnP (`pycolmap.absolute_pose_estimation`) to estimate camera pose.
+- The service validates results with two quality gates before returning success:
+  - `minimum_number_of_matches` (default `20`)
+  - `inlier_threshold` (default `0.5`, computed as $\frac{n_{inliers}}{n_{matches}}$)
+
+## Flow Diagram: Registration and Localization
+
+```mermaid
+flowchart TD
+    A[Polycam zip uploaded] --> B[Preprocess dataset and transform to SceneScape layout]
+    B --> C[Registration start]
+    C --> D[Extract NetVLAD descriptors for DB images]
+    D --> E[Save DB global descriptors<br/>global-feats-netvlad.h5]
+
+    E --> F[Calibration request with query frame]
+    F --> G[Extract query NetVLAD descriptor]
+    G --> H[pairs_from_retrieval selects top-K DB images]
+
+    H --> I{Local matching mode}
+    I -->|Sparse| J[Extract local features<br/>example: SIFT]
+    J --> K[match_features<br/>example: NN-ratio]
+    I -->|Dense| L[match_dense with QTA-LoFTR<br/>coarse block type: quadtree]
+
+    K --> M[localize_scenescape pose_from_cluster]
+    L --> M
+    M --> N[Back-project DB matches to 3D using depth or mesh]
+    N --> O[pycolmap PnP with RANSAC]
+    O --> P{Quality gates pass?}
+    P -->|No| Q[Return weak or insufficient matches]
+    P -->|Yes| R[Return quaternion and translation]
+```
+
+These values are scene-level configuration inputs from the service model: `global_feature`, `local_feature`, `matcher`, `number_of_localizations`, `minimum_number_of_matches`, and `inlier_threshold`.