SG-Reg: Generalizable and Efficient
Scene Graph Registration

Accepted by IEEE T-RO
Chuhao Liu¹, Zhijian Qiao¹, Jieqi Shi^2,*, Ke Wang³, Peize Liu ¹ and Shaojie Shen¹

¹HKUST Aerial Robotics Group ² NanJing University ³Chang'an University
^*Corresponding Author

News

[21 Apr 2025] Publish the initial version of code.
[19 Apr 2025] Our paper is accepted by IEEE T-RO as a regular paper.
[8 Oct 2024] Paper submitted to IEEE T-RO.

In this work, we learn to register two semantic scene graphs, an essential capability when an autonomous agent needs to register its map against a remote agent, or against a prior map. To acehive a generalizable registration in the real-world, we design a scene graph network to encode multiple modalities of semantic nodes: open-set semantic feature, local topology with spatial awareness, and shape feature. SG-Reg represents a dense indoor scene in coarse node features and dense point features. In multi-agent SLAM systems, this representation supports both coarse-to-fine localization and bandwidth-efficient communication. We generate semantic scene graph using vision foundation models and semantic mapping module FM-Fusion. It eliminates the need for ground-truth semantic annotations, enabling fully self-supervised network training. We evaluate our method using real-world RGB-D sequences: ScanNet, 3RScan and self-collected data using Realsense i-435.

1. Install

Create virtual environment,

conda create sgreg python=3.9

Install PyTorch 2.1.2 and other dependencies.

conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia

pip install -r requirements.txt
python setup.py build develop

2. Download Dataset

Download the 3RScan (RIO) data 坚果云nutStore link. It involves $50$ pairs of scene graphs. In RIO_DATAROOT, the data are organized in the following structures.

|--val
    |--scenexxxx_00a % each individual scene graph
    |-- ....
|--splits
    |-- val.txt
|--gt
    |-- SRCSCENE-REFSCENE.txt % T_ref_src
|--matches
    |-- SRCSCENE-REFSCENE.pth % ground-truth node matches
|--output
    |--CHECKPOINT_NAME % default: sgnet_scannet_0080
        |--SRCSCENE-REFSCENE % results of scene pair

We also provide another 50 pairs of ScanNet scenes. Please download the ScanNet data using this 坚果云nutStore link. They are organized in the same data structure as the 3RScan data.

*Note: We did not use any ground-truth semantic annotation from 3RScan or ScanNet. The downloaded scene graphs are reconstructed using FM-Fusion. You can also download the original RGB-D sequences and build your scene graphs using FM-Fusion. If you want to try, ScanNet sequences should be easier to start with.

3. Inference 3RScan Scenes

Find the config/rio.yaml and set the dataroot/dataroot to be the RIO_DATASET directory on your machine. Then, run the inference program,

python sgreg/val.py --cfg_file config/rio.yaml

It will inference all of the downloaded scene pairs in 3RScan. The registration results, including matched nodes, point correspondences and predicted transformation are saved at RIO_DATAROOT/ouptut/CHECKPOINT_NAME/SRCSCENE-REFSCENE. You can visualize the registration results,

python sgreg/visualize.py --dataroot $RIO_DATAROOT$ --viz_mode 1 --find_gt --viz_translation [3.0,5.0,0.0]

It should visualize the results as below,

On the left column, you can select the entities you want to visualize.

If you run the program on a remote server, rerun supports remote visualization (see rerun connect_tcp). Check the arguments instruction in visualize.py to customize your visualization.

[Optional] If you want to evaluate SG-Reg on ScanNet sequences, adjust the running options as below,

python sgreg/val.py --cfg_file config/scannet.yaml 
python sgreg/visualize.py --dataroot $SCANNET_DATAROOT$ --viz_mode 1 --augment_transform --viz_translation [3.0,5.0,0.0]

4. Evaluate on your own data

We think generalization capability remains to be a key challenge in 3D semantic perception. If you are interested in the task we are doing, we encourage you to collect your own RGB-D sequence to evaluate. It requires VINS-Mono to compute camera poses, Grounded-SAM to generate semantic labels, and FM-Fusion to reconstruct a semantic scene graph. We will add a detailed instruction later to illustrate how to build your own data.

5. Develop Log

We will continue to maintain this repo. If you encounter any problem in using it, feel free to publish an issue. We'll try to help.

6. Acknowledge

We used some of the code from GeoTransformer, SG-PGM and LightGlue. SkyLand provides lidar-camera suite to allow us evaluating SG-Reg in large-scale scenes (as demonstrated at the end of the video).

7. License

The source code is released under GPLv3 license. For technical issues, please contact Chuhao LIU ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
docs		docs
sgreg		sgreg
tutorials		tutorials
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SG-Reg: Generalizable and Efficient
Scene Graph Registration

News

1. Install

2. Download Dataset

3. Inference 3RScan Scenes

4. Evaluate on your own data

5. Develop Log

6. Acknowledge

7. License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

HKUST-Aerial-Robotics/SG-Reg

Folders and files

Latest commit

History

Repository files navigation

SG-Reg: Generalizable and Efficient Scene Graph Registration

News

1. Install

2. Download Dataset

3. Inference 3RScan Scenes

4. Evaluate on your own data

5. Develop Log

6. Acknowledge

7. License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

SG-Reg: Generalizable and Efficient
Scene Graph Registration

Packages