Hi, thank you for releasing this excellent work! 🙌
I’m trying to finetune RoMa on my own custom dataset, but my data is different from the MegaDepth setup.
My dataset contains image pairs from multiple modalities (e.g., RGB–thermal, RGB–NIR, etc.).
For each pair, I already have ground-truth pixel correspondences (matching points).
However, I do not have depth maps or camera intrinsics/extrinsics like in MegaDepth.
I saw in your documentation and code that the training pipeline in RoMa (and DKM) expects depth supervision and pose information (as used in MegaDepth and ScanNet).
In my case, I only have explicit match coordinates.