This repository provides ROS 2 nodes for performing multiple vision tasks—such as object detection, semantic segmentation, and depth estimation—using Meta’s DINOv3 as the backbone. A key advantage of this approach is that the DINOv3 backbone features are computed only once (the most computationally demanding step), and these shared features are then reused by lightweight task-specific heads. This design significantly reduces redundant computation and makes multi-task inference more efficient.
First, ROS2 Humble should be installed. Follow instructions for ROS2 Humble installation. Previous versions are not reliable due to the need of recent versions of Python to run DINOv3.
git clone --recurse-submodules https://github.com/Raessan/dinov3_ros.git
cd dinov3_ros
pip install -e .
cd ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build
. install/setup.bash
The only package that has to be installed separately is pytorch, due to its dependence with the CUDA version. For example:
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129
Finally, we provide weights for the lightweight heads developed by us, but the DINOv3 backbone weights should be requested and obtained from their repo. Its default placement is in dinov3_toolkit/backbone/weights. The presented heads have been trained using the vits16plus model from DINOv3 as a backbone.
If running with docker, two steps are needed to work:
-
First install the Nvidia Container toolkit in the host machine.
-
Build the Dockerfile setting any argument, such as the torch index URL. Finally, create the container, with the following lines:
docker compose build --build-arg TORCH_INDEX_URL=https://download.pytorch.org/whl/cu129
docker compose up
Any terminal should be opened as docker exec -it dinov3_ros bash.
Launch the bringup file from the ros2_ws folder with the command ros2 launch bringup dinov3_ros dinov3.launch.py arg1:=value arg2:=value. The available launch arguments so far are:
-
debug: Whether to publish debug images that help interpret visually the results of the tasks. For example, overlaid bounding boxes for the task of detection, or colored depth map in the task of depth estimation (default: true).
-
perform_{task}: task can be any of the developed head (detection, segmentation, depth...) and this variables activates or deactivates the task (default: all true).
-
topic_image: The name of the topic that contains the input image (default: topic_image).
-
image_reliability: The QoS reliability for the ROS2 subscriber. 0 corresponds to
SYSTEM_DEFAULT, 1 corresponds toRELIABLE, and 2 corresponds toBEST_EFFORT(default: 2) -
params_file: The path to the config file with required information for the models. This file is by default in
config/params.yamland contains important variables such as theimg_size(default 640x640, used to train the provided models), thedevice(default cuda) and the paths of the backbones and heads, along with variables to create the models or perform inference.
The file params.yaml should be changed before launching the bringup file if the variables should be different from the ones provided.
META has only released model heads for the large ViT-7B backbone, so for smaller backbones we trained task-specific heads (each < 5M parameters) in separate repositories to achieve good precision. Our goal was not to beat SOTA models, but to provide a lightweight, plug-and-play toolkit.
Each task has a head_{task} subfolder in dinov3_toolkit containing a model_head.py and utils.py copied from the original repo. The backbone folder contains model_backbone.py, while common.py provides shared utilities. Some tasks also include a class_names.txt file listing the classes used for training.
Check the following repo: object_detection_dinov3
Check the following repo: semantic_segmentation_dinov3
Check the following repo: depth_dinov3
Check the following repo: optical_flow_dinov3
- Code in this repo: Apache-2.0.
- DINOv3 submodule: licensed separately by Meta (see its LICENSE).
- We don't distribute DINO weights. Follow upstream instructions to obtain them.
-
González-Santamarta, Miguel Á (2023). yolo_ros (used as reference for some part of the implementation)

