This package implements an enhanced hierarchical 3D scene graph based on Hydra, integrating open-vocabulary features for rooms and objects, and supporting object-relational reasoning.
We leverage a Vision-Language Model (VLM) to infer semantic relationships. Additionally, we introduce a task reasoning module that combines Large Language Models (LLM) and a VLM to interpret the scene graph’s semantic and relational information, enabling agents to reason about tasks and interact with their environment intelligently.
These instructions assume that ros-noetic-desktop-full is installed on Ubuntu 20.04.
Install general dependencies:
sudo apt install python3-rosdep python3-catkin-tools python3-vcstoolBuild the repository in Release mode:
mkdir -p catkin_ws/src
cd catkin_ws
catkin init
catkin config -DCMAKE_BUILD_TYPE=Release
cd src
git clone git@github.com:ntnu-arl/reasoning_hydra.git
vcs import . < reasoning_hydra/install/packages.repos
rosdep install --from-paths . --ignore-src -r -y
cd ..
catkin buildFollow the instructions in semantic_inference_ros to set up the Python environment required to run the semantic and reasoning models.
The system supports multiple datasets and online deployment on robots with GPU capabilities (e.g., Nvidia Jetson Orin AGX).
Download rosbags from Uhumans2 dataset.
Start the scene graph:
roslaunch hydra_ros uhumans2.launchIn a separate terminal, play the rosbag:
rosbag play path/to/rosbagFollow NICE-SLAM instructions to download posed RGB-D data from Replica scenes.
Run the scene graph:
roslaunch hydra_ros replica.launchPublish the data:
roslaunch hydra_ros publish_replica.launch dataset_path:=<Path to your replica dataset> scene_name:=<Scene name>Follow HOV-SG instructions (Step 2 can be skipped) to download posed RGB-D data from several scenes.
Run the scene graph:
roslaunch hydra_ros hm3dsem.launch
roslaunch hydra_ros publish_hm3dsem.launch dataset_path:=<Path to hm3d_trajectories> scene_name:=<Scene name>To run the scene graph on your robot:
- Robot must provide posed RGB-D data as
sensor_msgs/Image - Pose must be provided via TFs
Update robot.launch with the correct TFs and camera topic names, then run:
roslaunch hydra_ros robot.launchWe provide recorded data from experiments with an ANYMal robot. Download it here.
To use this data:
roslaunch hydra_ros robot.launch playback_mode:=TrueThen play one of the downloaded rosbags:
rosbag play <bag_to_play> --topics /tf /camera/aligned_depth_to_color/image_raw/compressedDepth /camera/color/camera_info /camera/color/image_raw/compressed --clockThe reasoning module (VLM + LLMs) requires an internet connection.
- LLM queries are done via OpenAI API.
- A large VLM is hosted externally (setup instructions: semantic_inference_ros)
IMPORTANT: When using the reasoning module, set your OpenAI and FastAPI (see https://github.com/ntnu-arl/semantic_inference_ros) keys as environment variables before launching the ROS nodes:
export OPENAI_API_KEY=<Your OpenAI API Key>
export FASTAPI_API_KEY=<Your server FastAPI Key>Once the scene graph is constructed, either:
-
Use the provided rviz GUI to interact with the service and visualize task reasoning results on the scene graph.
-
Or call the ROS service:
/semantic_inference/navigation_prompt_service/navigation_prompt
If you use this work in your research, please cite:
@inproceedings{puigjaner2026reasoninggraph,
title={Relationship-Aware Hierarchical 3D Scene Graph},
author={Gassol Puigjaner, Albert and Zacharia, Angelos and Alexis, Kostas},
booktitle={2026 IEEE International Conference on Robotics and Automation (ICRA)},
year={2026}
}Released under BSD-3-Clause.
This open-source release is based on work supported by the European Commission through:
- Project SYNERGISE, under Horizon Europe Grant Agreement No. 101121321
For questions or support, reach out via GitHub Issues or contact the authors directly:
