Skip to content

[T-RO 2025] SG-Reg: Generalizable and Efficient Scene Graph Registration

Notifications You must be signed in to change notification settings

HKUST-Aerial-Robotics/SG-Reg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SG-Reg: Generalizable and Efficient
Scene Graph Registration

Accepted by IEEE T-RO
Chuhao Liu1, Zhijian Qiao1, Jieqi Shi2,*, Ke Wang3, Peize Liu 1 and Shaojie Shen1

1HKUST Aerial Robotics Group    2 NanJing University    3Chang'an University   
*Corresponding Author

T-RO Arxiv YouTube Bilibili HuggingFace Space

News

  • [21 Apr 2025] Publish the initial version of code.
  • [19 Apr 2025] Our paper is accepted by IEEE T-RO as a regular paper.
  • [8 Oct 2024] Paper submitted to IEEE T-RO.

In this work, we learn to register two semantic scene graphs, an essential capability when an autonomous agent needs to register its map against a remote agent, or against a prior map. To acehive a generalizable registration in the real-world, we design a scene graph network to encode multiple modalities of semantic nodes: open-set semantic feature, local topology with spatial awareness, and shape feature. SG-Reg represents a dense indoor scene in coarse node features and dense point features. In multi-agent SLAM systems, this representation supports both coarse-to-fine localization and bandwidth-efficient communication. We generate semantic scene graph using vision foundation models and semantic mapping module FM-Fusion. It eliminates the need for ground-truth semantic annotations, enabling fully self-supervised network training. We evaluate our method using real-world RGB-D sequences: ScanNet, 3RScan and self-collected data using Realsense i-435.

1. Install

Create virtual environment,

conda create sgreg python=3.9

Install PyTorch 2.1.2 and other dependencies.

conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia 
pip install -r requirements.txt
python setup.py build develop

2. Download Dataset

Download the 3RScan (RIO) data 坚果云nutStore link. It involves $50$ pairs of scene graphs. In RIO_DATAROOT, the data are organized in the following structures.

|--val
    |--scenexxxx_00a % each individual scene graph
    |-- ....
|--splits
    |-- val.txt
|--gt
    |-- SRCSCENE-REFSCENE.txt % T_ref_src
|--matches
    |-- SRCSCENE-REFSCENE.pth % ground-truth node matches
|--output
    |--CHECKPOINT_NAME % default: sgnet_scannet_0080
        |--SRCSCENE-REFSCENE % results of scene pair

We also provide another 50 pairs of ScanNet scenes. Please download the ScanNet data using this 坚果云nutStore link. They are organized in the same data structure as the 3RScan data.

*Note: We did not use any ground-truth semantic annotation from 3RScan or ScanNet. The downloaded scene graphs are reconstructed using FM-Fusion. You can also download the original RGB-D sequences and build your scene graphs using FM-Fusion. If you want to try, ScanNet sequences should be easier to start with.

3. Inference 3RScan Scenes

Find the config/rio.yaml and set the dataroot/dataroot to be the RIO_DATASET directory on your machine. Then, run the inference program,

python sgreg/val.py --cfg_file config/rio.yaml

It will inference all of the downloaded scene pairs in 3RScan. The registration results, including matched nodes, point correspondences and predicted transformation are saved at RIO_DATAROOT/ouptut/CHECKPOINT_NAME/SRCSCENE-REFSCENE. You can visualize the registration results,

python sgreg/visualize.py --dataroot $RIO_DATAROOT$ --viz_mode 1 --find_gt --viz_translation [3.0,5.0,0.0]

It should visualize the results as below,

On the left column, you can select the entities you want to visualize.

If you run the program on a remote server, rerun supports remote visualization (see rerun connect_tcp). Check the arguments instruction in visualize.py to customize your visualization.

[Optional] If you want to evaluate SG-Reg on ScanNet sequences, adjust the running options as below,

python sgreg/val.py --cfg_file config/scannet.yaml 
python sgreg/visualize.py --dataroot $SCANNET_DATAROOT$ --viz_mode 1 --augment_transform --viz_translation [3.0,5.0,0.0]

4. Evaluate on your own data

We think generalization capability remains to be a key challenge in 3D semantic perception. If you are interested in the task we are doing, we encourage you to collect your own RGB-D sequence to evaluate. It requires VINS-Mono to compute camera poses, Grounded-SAM to generate semantic labels, and FM-Fusion to reconstruct a semantic scene graph. We will add a detailed instruction later to illustrate how to build your own data.

5. Develop Log

  • Scene graph network code and verify its inference.
  • Remove unncessary dependencies.
  • Clean the data structure.
  • Visualize the results.
  • Provide RIO scene graph data for download.
  • Provide network weight for download.
  • Publish checkpoint on Huggingface Hub and reload.
  • Registration back-end in python interface. (The version used in the paper is a C++ version.)
  • Validation the entire system in a new computer.
  • A tutorial for running the validation.

We will continue to maintain this repo. If you encounter any problem in using it, feel free to publish an issue. We'll try to help.

6. Acknowledge

We used some of the code from GeoTransformer, SG-PGM and LightGlue. SkyLand provides lidar-camera suite to allow us evaluating SG-Reg in large-scale scenes (as demonstrated at the end of the video).

7. License

The source code is released under GPLv3 license. For technical issues, please contact Chuhao LIU ([email protected]).

About

[T-RO 2025] SG-Reg: Generalizable and Efficient Scene Graph Registration

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages