Grounding DINO 复现与评测

基于 Grounding DINO（Swin-T 主干）复现 开放词表目标检测（OVD） 与 视觉定位（Visual Grounding） 两类任务，在 COCO val2017 和 RefCOCO/+/g 上进行零样本（zero-shot）评测。

环境搭建

1. 创建 conda 环境（推荐）

在 PowerShell 中，&& 无效，命令需分开执行：

conda env create -f environment.yml
conda activate cv

已验证可用的替代方案（miniconda base，Python 3.10.13）：

pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

注意：numpy 须 <2.0（torch 2.1.2 基于 NumPy 1.x ABI）， transformers 须 <5.0（5.x 要求 PyTorch >= 2.4）。

2. 克隆 Grounding DINO

git clone https://github.com/IDEA-Research/GroundingDINO.git third_party/GroundingDINO

Windows 编译 CUDA 算子（需 VS 2019/2022 C++ Build Tools）

pip install -e third_party/GroundingDINO

备用方案：HuggingFace 后端（无需编译）

若 CUDA 算子编译失败，在 configs/coco_ovd.yaml 和 configs/refcoco.yaml 中将 model.backend 改为 hf，代码将自动切换到 transformers 实现：

model:
  backend: hf
  hf_model_id: IDEA-Research/grounding-dino-tiny

3. 下载预训练权重

python scripts/download_weights.py

或手动下载：

# Swin-T OGC 权重
wget -P weights/ https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# 对应配置文件
cp third_party/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py configs/

数据集准备

COCO val2017（OVD 评测）

data/coco/
├── val2017/               # ~1 GB
└── annotations/
    └── instances_val2017.json

下载：https://cocodataset.org/#download

COCO train2014（RefCOCO 图像来源）

data/coco/
└── train2014/             # ~13 GB

RefCOCO / RefCOCO+ / RefCOCOg

data/refcoco/
├── refcoco/    {refs(unc).p,    instances.json}
├── refcoco+/   {refs(unc).p,    instances.json}
└── refcocog/   {refs(google).p, instances.json}

参考：https://github.com/lichengunc/refer

运行评测

OVD — COCO val2017

# 快速子集验证（1000 张）
python src/ovd/eval_coco.py --config configs/coco_ovd.yaml --subset 1000

# 完整评测（全部 5000 张）
python src/ovd/eval_coco.py --config configs/coco_ovd.yaml

结果保存至 results/coco/predictions.json 和 results/coco/metrics.json。

Visual Grounding — RefCOCO/+/g

python src/grounding/eval_refcoco.py --config configs/refcoco.yaml --dataset refcoco
python src/grounding/eval_refcoco.py --config configs/refcoco.yaml --dataset refcoco+
python src/grounding/eval_refcoco.py --config configs/refcoco.yaml --dataset refcocog

结果保存至 results/refcoco/。

主要结果

任务	数据集 / 划分	指标	本复现	论文报告
OVD	COCO val2017	mAP	43.8	48.4
VG	RefCOCO val	Acc@0.5	22.47	89.19
VG	RefCOCO testB	Acc@0.5	28.97	85.89
VG	RefCOCO+ val	Acc@0.5	23.56	81.22
VG	RefCOCO+ testB	Acc@0.5	29.54	74.18
VG	RefCOCOg val	Acc@0.5	30.66	86.94

WSL gdino 后端（全量，见 results/coco_gdino/、results/refcoco_gdino/）：

任务	数据集 / 划分	指标	WSL gdino	论文
OVD	COCO val2017	mAP	42.4	48.4
VG	RefCOCO val	Acc@0.5	50.72	89.19
VG	RefCOCO testB	Acc@0.5	45.00	85.89
VG	RefCOCO+ val	Acc@0.5	51.64	81.22
VG	RefCOCO+ testB	Acc@0.5	46.35	74.18
VG	RefCOCOg val	Acc@0.5	60.44	86.94

完整分析见 reports/report.md（含 Windows HF vs WSL gdino 对比表）。

高精度复现计划（OVD + VG）

环境	文档
云服务器 Ubuntu 22 + V100（推荐）	docs/高精度复现计划_OVD_VG.md §0、§11
WSL	同上；算子编译见文档 §5

Cursor Plan 面板若看不到 .cursor/plans，直接打开上述 Markdown 或将 §11 的 Agent 提示粘贴到对话开头。

Notebooks

文件	用途
`notebooks/01_demo.ipynb`	单图推理 demo，验证环境
`notebooks/02_qualitative.ipynb`	定性可视化（每数据集 10–20 张）
`notebooks/03_failure_analysis.ipynb`	失败案例归类与 prompt 消融分析

项目结构

cv-project/
├── environment.yml          # conda 环境（推荐）
├── requirements.txt         # pip fallback
├── configs/
│   ├── coco_ovd.yaml        # OVD 评测配置
│   └── refcoco.yaml         # VG 评测配置
├── src/
│   ├── model_wrapper.py     # 统一推理接口（支持 gdino / hf 双后端）
│   ├── ovd/
│   │   ├── coco_prompt_builder.py
│   │   └── eval_coco.py
│   ├── grounding/
│   │   ├── refer_loader.py
│   │   └── eval_refcoco.py
│   ├── analysis/
│   │   ├── failure_miner.py
│   │   └── visualize.py
│   └── utils/
│       ├── io.py
│       ├── box_ops.py
│       └── logger.py
├── notebooks/
├── results/
├── reports/report.md
├── third_party/GroundingDINO/   # git clone（不入库）
├── weights/                     # 预训练权重（不入库）
└── data/                        # 数据集（不入库）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grounding DINO 复现与评测

环境搭建

1. 创建 conda 环境（推荐）

2. 克隆 Grounding DINO

Windows 编译 CUDA 算子（需 VS 2019/2022 C++ Build Tools）

备用方案：HuggingFace 后端（无需编译）

3. 下载预训练权重

数据集准备

COCO val2017（OVD 评测）

COCO train2014（RefCOCO 图像来源）

RefCOCO / RefCOCO+ / RefCOCOg

运行评测

OVD — COCO val2017

Visual Grounding — RefCOCO/+/g

主要结果

高精度复现计划（OVD + VG）

Notebooks

项目结构

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
docs		docs
notebooks		notebooks
patches		patches
reports		reports
results		results
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Grounding DINO 复现与评测

环境搭建

1. 创建 conda 环境（推荐）

2. 克隆 Grounding DINO

Windows 编译 CUDA 算子（需 VS 2019/2022 C++ Build Tools）

备用方案：HuggingFace 后端（无需编译）

3. 下载预训练权重

数据集准备

COCO val2017（OVD 评测）

COCO train2014（RefCOCO 图像来源）

RefCOCO / RefCOCO+ / RefCOCOg

运行评测

OVD — COCO val2017

Visual Grounding — RefCOCO/+/g

主要结果

高精度复现计划（OVD + VG）

Notebooks

项目结构

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages