Mobile robots exploring indoor environments increasingly rely on vision-language models to perceive high-level semantic cues in camera images, such as object categories. Such models offer the potential to substantially advance robot behaviour for tasks such as object-goal navigation (ObjectNav), where the robot must locate objects specified in natural language by exploring the environment. Current ObjectNav methods heavily depend on prompt engineering for perception and do not address the semantic uncertainty induced by variations in prompt phrasing. Ignoring semantic uncertainty can lead to suboptimal exploration, which in turn limits performance. Hence, we propose a semantic uncertainty-informed active perception pipeline for ObjectNav in indoor environments. We introduce a novel probabilistic sensor model for quantifying semantic uncertainty in vision-language models and incorporate it into a probabilistic geometric-semantic map to enhance spatial understanding. Based on this map, we develop a frontier exploration planner with an uncertainty-informed multi-armed bandit objective to guide efficient object search. Experimental results demonstrate that our method achieves ObjectNav success rates comparable to those of state-of-the-art approaches, without requiring extensive prompt engineering.
Create the conda environment:
conda_env_name=uiap-ogn
conda create -n $conda_env_name python=3.9 -y
conda activate $conda_env_name
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install git+https://github.com/IDEA-Research/GroundingDINO.git@eeba084341aaa454ce13cb32fa7fd9282fc73a67 salesforce-lavis==1.0.2
pip install -e .[habitat]Install all the dependencies:
git clone [email protected]:IDEA-Research/GroundingDINO.git
git clone [email protected]:WongKinYiu/yolov7.git # if using YOLOv7Follow the original install directions for GroundingDINO, which can be found here: https://github.com/IDEA-Research/GroundingDINO.
Nothing needs to be done for YOLOv7, but it needs to be cloned into the repo.
Only attempt if the installation instructions in the GroundingDINO repo do not work.
To install GroundingDINO, you will need CUDA_HOME set as an environment variable. If you would like to install a certain version of CUDA that is compatible with the one used to compile your version of pytorch, and you are using conda, you can run the following commands to install CUDA and set CUDA_HOME:
# This example is specifically for CUDA 11.8
mamba install \
cub \
thrust \
cuda-runtime \
cudatoolkit=11.8 \
cuda-nvcc==11.8.89 \
-c "nvidia/label/cuda-11.8.0" \
-c nvidia &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cuda_runtime/include/* ${CONDA_PREFIX}/include/ &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cusparse/include/* ${CONDA_PREFIX}/include/ &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cublas/include/* ${CONDA_PREFIX}/include/ &&
ln -s ${CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cusolver/include/* ${CONDA_PREFIX}/include/ &&
export CUDA_HOME=${CONDA_PREFIX}First, set the following variables during installation (don't need to put in .bashrc):
MATTERPORT_TOKEN_ID=<FILL IN FROM YOUR ACCOUNT INFO IN MATTERPORT>
MATTERPORT_TOKEN_SECRET=<FILL IN FROM YOUR ACCOUNT INFO IN MATTERPORT>
DATA_DIR=</path/to/uiap-ogn/data>
# Link to the HM3D ObjectNav episodes dataset, listed here:
# https://github.com/facebookresearch/habitat-lab/blob/main/DATASETS.md#task-datasets
# From the above page, locate the link to the HM3D ObjectNav dataset.
# Verify that it is the same as the next two lines.
HM3D_OBJECTNAV=https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zipEnsure that the correct conda environment is activated!!
# Download HM3D 3D scans (scenes_dataset)
python -m habitat_sim.utils.datasets_download \
--username $MATTERPORT_TOKEN_ID --password $MATTERPORT_TOKEN_SECRET \
--uids hm3d_train_v0.2 \
--data-path $DATA_DIR &&
python -m habitat_sim.utils.datasets_download \
--username $MATTERPORT_TOKEN_ID --password $MATTERPORT_TOKEN_SECRET \
--uids hm3d_val_v0.2 \
--data-path $DATA_DIR &&
# Download HM3D ObjectNav dataset episodes
wget $HM3D_OBJECTNAV &&
unzip objectnav_hm3d_v1.zip &&
mkdir -p $DATA_DIR/datasets/objectnav/hm3d &&
mv objectnav_hm3d_v1 $DATA_DIR/datasets/objectnav/hm3d/v1 &&
rm objectnav_hm3d_v1.zipThe weights for MobileSAM, GroundingDINO, and PointNav must be saved to the data/ directory. The weights can be downloaded from the following links:
mobile_sam.pt: https://github.com/ChaoningZhang/MobileSAMgroundingdino_swint_ogc.pth: https://github.com/IDEA-Research/GroundingDINOyolov7-e6e.pt: https://github.com/WongKinYiu/yolov7pointnav_weights.pth: included inside the data subdirectory
To run evaluation, various models must be loaded in the background first. This only needs to be done once by running the following command:
./scripts/launch_vlm_servers.sh(You may need to run chmod +x on this file first.)
This command will create a tmux session that will start loading the various models used for uiap-ogn and serving them through flask. When you are done, be sure to kill the tmux session to free up your GPU.
Run the following to evaluate on the HM3D dataset:
python -m vlfm.runTo evaluate on MP3D, run the following:
python -m vlfm.run habitat.dataset.data_path=data/datasets/objectnav/mp3d/val/val.json.gzIf you use uiap-ogn for any academic work, please use the following BibTeX entry.
@inproceedings{bajpai2025ecmr,
author = {U. Bajpai and J. R\"uckin and C. Stachniss and M. Popovi\'c},
title = {{Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation}},
booktitle = {European Conference on Mobile Robots (ECMR)},
year = 2025,
}
This project builds upon the VLFM codebase by Yokoyama et al. and Boston Dynamics AI Institute. Copyrights are preserved in any code directly borrowed from the authors. We gratefully acknowledge their contribution. This project is released under the MIT License.
