Semantic Segmentation and VLM Reasoning in ROS

This repository extends semantic_inference to provide closed and open set semantic segmentation methods. Additionally, it provides methods to extract CLIP embeddings of objects and relational embeddings using Visual Language Models (VLMs).

Setup

General Requirements

These instructions assume ros-noetic-desktop-full is installed on Ubuntu 20.04.

Install the general dependencies:

sudo apt install python3-rosdep python3-catkin-tools

Clone the repository and initialize submodules:

git clone git@github.com:ntnu-arl/semantic_inference_ros.git
git submodule init
git submodule update --recursive

Virtual Environment

It is highly recommended to set up a Python virtual environment to run ROS Python nodes:

cd /path/to/catkin_ws/src/semantic_inference/semantic_inference_python
python3.8 -m venv --system-site-packages ros_semantics_env
source ros_semantics_env/bin/activate
pip install -U pip
pip install -r requirements.txt

Building

Install ROS dependencies:

cd /path/to/catkin_ws/src
rosdep install --from-paths . --ignore-src -r -y

For closed-set segmentation, follow the setup instructions (skip Python utilities) in semantic_inference closed-set docs.

Build the workspace:

catkin config -DCMAKE_BUILD_TYPE=Release
catkin build

Usage

Open-set Segmentation

Open-set segmentation consumes RGB-D images and camera information to perform semantic segmentation and extract open-vocabulary features for each object.

Launch file: openset_segmentation.launch
Configuration: openset_segmentation.yaml

Supported open-set detectors: YOLOe and YOLOw. These can detect any list of objects without re-training.

roslaunch semantic_inference_ros openset_segmentation.launch

VLM for Object Relationship Embeddings

This method takes a segmented image along with its original RGB-D frame and computes visual features for each pair of detected objects. These features can be used to prompt a VLM for reasoning about relationships.

Launch file: vlm_features_node.launch
Configuration: vlm.yaml

Supported VLMs: InstructBLIP and DeepSeek-VL2.

To use DeepSeek-VL2, first extract the visual encoder as a standalone model. For the large model (used in our experiments), we provide it here.

Alternmatively, the models can be extracted with the following command (~100GB RAM required for the large moded):

python semantic_inference_python/scripts/extract_deepseek_visual.py --model_name <model to use> --output_path <path to store model>

Then, set the model path in vlm.yaml.

Launch the node:

roslaunch semantic_inference_ros vlm_features.launch

VLM/LLM Reasoning

This section enables reasoning on the relationship-aware hierarchical scene graph.

LLMs predict relevant objects and interactions for given tasks
VLM responses are parsed by LLMs
OpenAI API key required, run:

export OPENAI_API_KEY=<Your OpenAI API Key>

VLM reasoning is performed on the cloud. Use DeepSeek-VL2 server code to run FastAPI server.

Steps to set up the server:

Clone the server repo:

git clone git@github.com:ntnu-arl/DeepSeek-VL2.git -b server
cd DeepSeek-VL2

Set up the Python virtual environment:

bash setup.sh

Configure server path, port, and API key in run_server.sh.
Run the server (model download may take time):

bash run_server.sh

Finally, set the server URL in vlm_for_navigation.yaml and export your FASTAPI_KEY:

export FASTAPI_API_KEY=<Your server FastAPI Key>

Citation

@inproceedings{puigjaner2026reasoninggraph,
    title={Relationship-Aware Hierarchical 3D Scene Graph},
    author={Gassol Puigjaner, Albert and Zacharia, Angelos and Alexis, Kostas},
    booktitle={2026 IEEE International Conference on Robotics and Automation (ICRA)}, 
    year={2026}
}

Zenodo DOI

https://doi.org/10.5281/zenodo.18496220

License

Released under BSD-3-Clause.

Acknowledgements

This open-source release is based on work supported by the European Commission through:

Project SYNERGISE, under Horizon Europe Grant Agreement No. 101121321

Contact

For questions or support, reach out via GitHub Issues or contact the authors:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.dev		.dev
docs		docs
exporting		exporting
semantic_inference		semantic_inference
semantic_inference_msgs		semantic_inference_msgs
semantic_inference_python		semantic_inference_python
semantic_inference_ros		semantic_inference_ros
.clang-format		.clang-format
.cmake-format.yaml		.cmake-format.yaml
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Segmentation and VLM Reasoning in ROS

Table of Contents

Setup

General Requirements

Virtual Environment

Building

Usage

Open-set Segmentation

VLM for Object Relationship Embeddings

VLM/LLM Reasoning

Citation

Zenodo DOI

License

Acknowledgements

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Segmentation and VLM Reasoning in ROS

Table of Contents

Setup

General Requirements

Virtual Environment

Building

Usage

Open-set Segmentation

VLM for Object Relationship Embeddings

VLM/LLM Reasoning

Citation

Zenodo DOI

License

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages