DIOR-RSVG

Description

Official implementation of the paper:

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

Concept and goal

Our framework is designed to retain object detection capabilities while providing users with essential information to simplify query formulation for their object of interest.

Architecture

We propose a unified framework that integrates object detection (OD) and visual grounding (VG) for remote sensing (RS) imagery. To support conventional OD and establish an intuitive prior for VG task, we fine-tune an open-set object detector using referring expression data, framing it as a partially supervised OD task. In the first stage, we construct a graph representation of each image, comprising object queries, class embeddings, and proposal locations. Then, our task-aware architecture processes this graph to perform the VG task. The model consists of: (i) a multi-branch network that integrates spatial, visual, and categorical features to generate task-aware proposals, and (ii) an object reasoning network that assigns probabilities across proposals, followed by a soft selection mechanism for final referring object localization. Our model demonstrates superior performance on the OPT-RSVG and DIOR-RSVG datasets, achieving significant improvements over state-of-the-art methods while retaining classical OD capabilities.

Quantitative Results

* OPT-RSVG

Methods	Venue	Visual Encoder	Language Encoder	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	meanIoU	cmuIoU
NMTree	ICCV'19	ResNet-101	BiLSTM	69.28	64.17	55.22	40.31	12.90	60.12	69.85
Ref-NMS	AAAI'21	ResNet-101	Bi-GRU	70.59	65.61	58.01	41.36	14.58	60.42	70.72
LBYL-Net	CVPR'21	DarkNet-53	BERT	70.22	65.39	58.65	37.54	9.46	60.57	70.28
TransVG	CVPR'21	ResNet-50	BERT	69.96	64.17	54.68	38.01	12.75	59.80	69.31
VLTVG	CVPR'22	ResNet-101	BERT	73.50	68.31	59.93	43.45	15.31	62.84	73.80
MGVLF	TGRS'23	ResNet-50	BERT	72.19	66.86	58.02	42.51	15.30	61.51	71.80
LPVA	TGRS'24	ResNet-50	BERT	78.03	73.32	62.22	49.60	25.61	66.20	76.30
MB-ORES (Ours)	-	Swin-T	BERT	83.81	81.54	76.40	63.82	36.01	73.18	79.29

* DIOR-RSVG

Methods	Venue	Visual Encoder	Language Encoder	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	meanIoU	cmuIoU
ReSC	ECCV'20	DarkNet-53	BERT	72.71	68.92	63.01	53.70	33.37	64.24	68.10
LBYL-Net	CVPR'21	DarkNet-53	BERT	73.78	69.22	65.56	47.89	15.69	65.92	76.37
TransVG	CVPR'21	ResNet-50	BERT	72.41	67.38	60.05	49.10	27.84	63.56	76.27
QRNet	CVPR'22	Swin	BERT	75.84	70.82	62.27	49.63	25.69	66.80	83.02
EarthGPT	TGRS'24	ViT	Llama-2	76.65	71.93	66.52	56.53	37.63	69.34	81.54
GeoGround	-	CLIP-ViT	Vicuna 1.5	77.73	-	-	-	-	-	-
VLTVG	CVPR'22	ResNet-101	BERT	75.79	72.22	66.33	55.17	33.11	66.32	77.85
MGVLF	TGRS'23	ResNet-50	BERT	76.78	72.68	66.74	56.42	35.07	68.04	78.41
LPVA	TGRS'24	ResNet-50	BERT	82.27	77.44	72.25	60.98	39.55	72.35	85.11
MB-ORES (Ours)	-	Swin-T	BERT	85.65	83.89	80.87	73.00	54.39	77.73	83.06

* Ablation Study

Our Multi-branch based design significantly improves the performance of the VG task.

# Heads / # Layers	Multi-Branch	Object Reasoner	#Params.	DIOR-RSVG		OPT-RSVG
# Heads / # Layers	Multi-Branch	Object Reasoner	#Params.	MeanIoU	CmuIoU	MeanIoU	CmuIoU
(h, l/k)	(4,1)	(4,3)	6.38M	77.18	81.67	72.15	78.27
	(4,1)	(8,6)	11.13M	77.26	81.71	72.37	78.31
	(4,3)	(4,3)	7.97M	77.73	83.06	72.73	78.60
	(4,3)	(8,6)	12.70M	77.72	82.42	73.18	79.29
	×	(4,3)	5.13M	73.50	77.94	66.04	72.54
	×	(8,6)	9.87M	73.93	78.43	66.38	73.26

Qualitative Results

Referring Expression Comprehension (REC)

Visualization of referring objects for multiple referring expression queries per image.

DIOR-RSVG

OPT-RSVG

Object Detection (OD)

DIOR-RSVG

OPT-RSVG

Unification of REC and OD

Simultaneous object detection and referring object localization.

Bibtex

If you find this work useful in your research, please cite:

@article{radouane2025mboresmultibranchobjectreasoner,
      title={MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing}, 
      author={Karim Radouane and Hanane Azzag and Mustapha lebbah},
      year={2025},
      eprint={2503.24219},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.24219}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
viz		viz
.gitignore		.gitignore
README.md		README.md
custom_framework_rec_v2.png		custom_framework_rec_v2.png
unify_custom_approach_v3.png		unify_custom_approach_v3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Concept and goal

Architecture

Quantitative Results

Qualitative Results

DIOR-RSVG

OPT-RSVG

DIOR-RSVG

OPT-RSVG

Bibtex

About

Releases

Packages

rd20karim/MB-ORES

Folders and files

Latest commit

History

Repository files navigation

Description

Concept and goal

Architecture

Quantitative Results

Qualitative Results

DIOR-RSVG

OPT-RSVG

DIOR-RSVG

OPT-RSVG

Bibtex

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages