`Embodied Crowd Counting`

Interactive crowd counting dataset and an MLLM-driven coarse-to-fine counting agent.

Authors

Runling Long¹, Yunlong Wang¹, Jia Wan¹*, Xiang Deng¹, Xingting Zhu², Weili Guan¹, Antoni B. Chan², Liqiang Nie¹

¹ <Harbin Institute of Technology, Shenzhen>
² <City University of Hong Kong>
* Corresponding author

Links

Paper: Paper Link
Hugging Face Dataset: Dataset
Code Repository: GitHub

Updates

[04/2026] Initial release

Introduction

This is the official implementation for Embodied Crowd Counting.

Occlusion is one of the fundamental challenges in crowd counting. In the community, various data-driven approaches have been developed to address this issue, yet their effectiveness is limited. This is mainly because most existing crowd counting datasets on which the methods are trained are based on passive cameras, restricting their ability to fully sense the environment. Recently, embodied navigation methods have shown significant potential for precise object detection in interactive scenes. These methods incorporate active camera settings, holding promise for addressing the fundamental issues in crowd counting. However, most existing methods are designed for indoor navigation, showing unknown performance in analyzing complex object distributions in large-scale scenes, such as crowds. In addition, most existing embodied navigation datasets are indoor scenes with limited scale and object quantity, preventing them from being introduced into dense crowd analysis. Based on this, a novel task, Embodied Crowd Counting (ECC), is proposed to count the number of persons in a large-scale scene actively. We then build an interactive simulator, the Embodied Crowd Counting Dataset (ECCD), which enables large-scale scenes and large object quantities. A prior probability distribution approximating a realistic crowd distribution is introduced to generate crowds. Then, a zero-shot navigation method (ZECC) is proposed as a baseline. This method contains an MLLM-driven coarse-to-fine navigation mechanism, enabling active Z-axis exploration, and a normal-line-based crowd distribution analysis method for fine-grained counting. Experimental results show that the proposed method achieves the best trade-off between counting accuracy and navigation cost.

We present the dataset and the method implementation in this page.

Highlights

We provide the full pipeline of our method.
The dataset will be ready for further studies.

Framework

Framework Figure

Figure 1. The proposed framework. First, ATE is proposed to estimate the global crowd distribution efficiently. Then, NLBN is proposed to generate fine observation points, alleviating crowd overlap. The final result is generated by aggregating all fine detections.

Project Structure

.
├── Config.yml
├── Main.py
├── Configs/
│   ├── CountConfig.yml
│   ├── FBEConfig.yml
│   ├── FBEWithDGConfig.yml
│   ├── OurMethodConfig.yml
│   └── settings.json
├── Agent/
│   └── Prompts.py
├── assests/
├── Count/
│   └── Count.py
├── Dataset/
├── Drone/
│   └── Control.py
├── Explore/
│   ├── DensityGuided.py
│   ├── DroneLift.py
│   ├── Explore.py
│   ├── Frontier.py
│   ├── OurExplore.py
│   ├── path_3D.py
│   └── Target.py
├── Log/
├── Methods/
│   ├── FBE.py
│   ├── FBEWithDG.py
│   ├── OurMethod.py
│   └── count.py
├── Others/
│   ├── DensityGuided/
│   ├── IntuitionMap/
│   └── ValueMap/
├── Perception/
│   ├── GeneralizedLoss.py
│   ├── GPT.py
│   ├── GroundingDINO.py
├── Point_cloud/
│   ├── Map_element.py
│   └── Point_cloud.py
├── Record/
├── Simulator/
│   └── Simulator.py
├── utils/
│   ├── flight.py
│   ├── logger.py
│   ├── saver.py
│   └── video.py
├── Vision_models/
│   ├── GeneralizedLoss/
│   └── GroundingDINO/
├── requirements.txt
├── README.md
├── LICENSE

Installation

Note that this project currently supports Windows only.

1. Clone the repository

git clone https://github.com/iLearn-Lab/NeurIPS25-Embodied-Crowd-Counting.git
cd NeurIPS25-Embodied-Crowd-Counting

2. Install dependencies

pip install -r requirements.txt

3. Install GroundingDINO

cd Vision_models/GroundingDINO
pip install -e .

4. Download model weights

GroundingDINO

cd Vision_models/GroundingDINO
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

GeneralizedLoss

Download the pretrained model from the Google Drive link in Vision_models/GeneralizedLoss/README.md and place it in Vision_models/GeneralizedLoss.

5. Download dataset

pip install -U huggingface_hub
huggingface-cli download iLearn-Lab/NeurIPS25-Embodied-Crowd-Counting `
  --repo-type dataset `
  --local-dir .\Dataset `
  --local-dir-use-symlinks False

The Dataset/ directory should look like:

Dataset/
├── CITY/...
├── FACADES/...
├── HARBOUR/...
├── NEIGHBOUR/...
├── PARKING/...
└── STADIUM/...

Usage

Configure the dataset path in Config.yml.
Configure the selected method parameters and counting parameters in Configs/.
Run Main.py.
Check results in Record.

Citation

@article{long2025embodied,
  title={Embodied Crowd Counting},
  author={Long, Runling and Wang, Yunlong and Wan, Jia and Deng, Xiang and Zhu, Xinting and Guan, Weili and Chan, Antoni B and Nie, Liqiang},
  journal={arXiv preprint arXiv:2503.08367},
  year={2025}
}

Acknowledgement

Thanks to our supervisor and collaborators for valuable support.

License

This project is released under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Embodied Crowd Counting`

Authors

Links

Table of Contents

Updates

Introduction

Highlights

Framework

Framework Figure

Project Structure

Installation

1. Clone the repository

2. Install dependencies

3. Install GroundingDINO

4. Download model weights

GroundingDINO

GeneralizedLoss

5. Download dataset

Usage

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Agent		Agent
Configs		Configs
Count		Count
Drone		Drone
Explore		Explore
Methods		Methods
Others		Others
Perception		Perception
Point_cloud		Point_cloud
Simulator		Simulator
Vision_models		Vision_models
assests		assests
utils		utils
.gitignore		.gitignore
Config.yml		Config.yml
Main.py		Main.py
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Embodied Crowd Counting

Authors

Links

Table of Contents

Updates

Introduction

Highlights

Framework

Framework Figure

Project Structure

Installation

1. Clone the repository

2. Install dependencies

3. Install GroundingDINO

4. Download model weights

GroundingDINO

GeneralizedLoss

5. Download dataset

Usage

Citation

Acknowledgement

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Embodied Crowd Counting`

Packages