Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation

Abstract

Mobile robots exploring indoor environments increasingly rely on vision-language models to perceive high-level semantic cues in camera images, such as object categories. Such models offer the potential to substantially advance robot behaviour for tasks such as object-goal navigation (ObjectNav), where the robot must locate objects specified in natural language by exploring the environment. Current ObjectNav methods heavily depend on prompt engineering for perception and do not address the semantic uncertainty induced by variations in prompt phrasing. Ignoring semantic uncertainty can lead to suboptimal exploration, which in turn limits performance. Hence, we propose a semantic uncertainty-informed active perception pipeline for ObjectNav in indoor environments. We introduce a novel probabilistic sensor model for quantifying semantic uncertainty in vision-language models and incorporate it into a probabilistic geometric-semantic map to enhance spatial understanding. Based on this map, we develop a frontier exploration planner with an uncertainty-informed multi-armed bandit objective to guide efficient object search. Experimental results demonstrate that our method achieves ObjectNav success rates comparable to those of state-of-the-art approaches, without requiring extensive prompt engineering.

Installation

Getting Started

Create the conda environment:

conda_env_name=uiap-ogn
conda create -n $conda_env_name python=3.9 -y
conda activate $conda_env_name
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install git+https://github.com/IDEA-Research/GroundingDINO.git@eeba084341aaa454ce13cb32fa7fd9282fc73a67 salesforce-lavis==1.0.2

pip install -e .[habitat]

Install all the dependencies:

git clone [email protected]:IDEA-Research/GroundingDINO.git
git clone [email protected]:WongKinYiu/yolov7.git  # if using YOLOv7

Follow the original install directions for GroundingDINO, which can be found here: https://github.com/IDEA-Research/GroundingDINO.

Nothing needs to be done for YOLOv7, but it needs to be cloned into the repo.

Installing GroundingDINO (Only if using conda-installed CUDA)

Only attempt if the installation instructions in the GroundingDINO repo do not work.

To install GroundingDINO, you will need CUDA_HOME set as an environment variable. If you would like to install a certain version of CUDA that is compatible with the one used to compile your version of pytorch, and you are using conda, you can run the following commands to install CUDA and set CUDA_HOME:

# This example is specifically for CUDA 11.8
mamba install \
    cub \
    thrust \
    cuda-runtime \
    cudatoolkit=11.8 \
    cuda-nvcc==11.8.89 \
    -c "nvidia/label/cuda-11.8.0" \
    -c nvidia &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cuda_runtime/include/*  ${CONDA_PREFIX}/include/ &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cusparse/include/*  ${CONDA_PREFIX}/include/ &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cublas/include/*  ${CONDA_PREFIX}/include/ &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cusolver/include/*  ${CONDA_PREFIX}/include/ &&
export CUDA_HOME=${CONDA_PREFIX}

Datasets

Matterport

First, set the following variables during installation (don't need to put in .bashrc):

MATTERPORT_TOKEN_ID=<FILL IN FROM YOUR ACCOUNT INFO IN MATTERPORT>
MATTERPORT_TOKEN_SECRET=<FILL IN FROM YOUR ACCOUNT INFO IN MATTERPORT>
DATA_DIR=</path/to/uiap-ogn/data>

# Link to the HM3D ObjectNav episodes dataset, listed here:
# https://github.com/facebookresearch/habitat-lab/blob/main/DATASETS.md#task-datasets
# From the above page, locate the link to the HM3D ObjectNav dataset.
# Verify that it is the same as the next two lines.
HM3D_OBJECTNAV=https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zip

Clone and install habitat-lab, then download datasets

Ensure that the correct conda environment is activated!!

# Download HM3D 3D scans (scenes_dataset)
python -m habitat_sim.utils.datasets_download \
  --username $MATTERPORT_TOKEN_ID --password $MATTERPORT_TOKEN_SECRET \
  --uids hm3d_train_v0.2 \
  --data-path $DATA_DIR &&
python -m habitat_sim.utils.datasets_download \
  --username $MATTERPORT_TOKEN_ID --password $MATTERPORT_TOKEN_SECRET \
  --uids hm3d_val_v0.2 \
  --data-path $DATA_DIR &&

# Download HM3D ObjectNav dataset episodes
wget $HM3D_OBJECTNAV &&
unzip objectnav_hm3d_v1.zip &&
mkdir -p $DATA_DIR/datasets/objectnav/hm3d  &&
mv objectnav_hm3d_v1 $DATA_DIR/datasets/objectnav/hm3d/v1 &&
rm objectnav_hm3d_v1.zip

Model weights

The weights for MobileSAM, GroundingDINO, and PointNav must be saved to the data/ directory. The weights can be downloaded from the following links:

mobile_sam.pt: https://github.com/ChaoningZhang/MobileSAM
groundingdino_swint_ogc.pth: https://github.com/IDEA-Research/GroundingDINO
yolov7-e6e.pt: https://github.com/WongKinYiu/yolov7
pointnav_weights.pth: included inside the data subdirectory

Evaluation

To run evaluation, various models must be loaded in the background first. This only needs to be done once by running the following command:

./scripts/launch_vlm_servers.sh

(You may need to run chmod +x on this file first.) This command will create a tmux session that will start loading the various models used for uiap-ogn and serving them through flask. When you are done, be sure to kill the tmux session to free up your GPU.

Run the following to evaluate on the HM3D dataset:

python -m vlfm.run

To evaluate on MP3D, run the following:

python -m vlfm.run habitat.dataset.data_path=data/datasets/objectnav/mp3d/val/val.json.gz

Citation

If you use uiap-ogn for any academic work, please use the following BibTeX entry.

@inproceedings{bajpai2025ecmr,
author = {U. Bajpai and J. R\"uckin and C. Stachniss and M. Popovi\'c},
title = {{Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation}},
booktitle = {European Conference on Mobile Robots (ECMR)},
year = 2025,
}

Acknowledgement

This project builds upon the VLFM codebase by Yokoyama et al. and Boston Dynamics AI Institute. Copyrights are preserved in any code directly borrowed from the authors. We gratefully acknowledge their contribution. This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
docker		docker
docs		docs
scripts		scripts
test		test
vlfm		vlfm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation

Abstract

Installation

Getting Started

Installing GroundingDINO (Only if using conda-installed CUDA)

Datasets

Matterport

Clone and install habitat-lab, then download datasets

Model weights

Evaluation

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

PRBonn/uiap-ogn

Folders and files

Latest commit

History

Repository files navigation

Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation

Abstract

Installation

Getting Started

Installing GroundingDINO (Only if using conda-installed CUDA)

Datasets

Matterport

Clone and install habitat-lab, then download datasets

Model weights

Evaluation

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages