CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation

Official PyTorch implementation of the IEEE TCSVT 2024 paper "CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation".

Authors

Mingzhu Xu¹, Tianxiang Xiao¹, Yutong Liu¹, Haoyu Tang¹*, Yupeng Hu¹, Liqiang Nie²

¹ Shandong University
² Harbin Institute of Technology (Shen Zhen)
* Corresponding author

Links

Paper: IEEE Xplore
Code Repository: GitHub

Introduction

This project is the official implementation of the paper "CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation".

CMIRNet aims to address the challenge of cross-modal alignment in the task of Referring Image Segmentation:

Problem Addressed：How to effectively perform deep interaction and logical reasoning between visual features and natural language descriptions.
Core Idea：A Cross-Modal Interactive Reasoning Network (CMIRNet) is proposed, which leverages architectures such as Graph Neural Networks (GNNs) to enhance semantic correlations across modalities.
This Repository Provides：Complete training and testing code supporting both ResNet and Swin Transformer backbones.

Highlights

Proposes a Cross-Modal Interactive Reasoning mechanism to enhance alignment between vision and language.
Supports multiple backbone architectures (ResNet-50/101 and Swin Transformer Base/Large).
Achieves strong performance on benchmark datasets (RefCOCO, RefCOCO+, RefCOCOg, RefCLEF).

Project Structure

.
├── data/                  # Stores images and annotation data
├── train_resnet.py        # Training script based on ResNet
├── train_swin.py          # Training script based on Swin Transformer
├── test_resnet.py         # Testing script based on ResNet
├── test_swin.py           # Testing script based on Swin Transformer
├── README.md
└── requirements.txt

Installation

1. Clone the repository

git clone [https://github.com/iLearn-Lab/CMIRNet.git](https://github.com/iLearn-Lab/CMIRNet.git)
cd CMIRNet

2. Install dependencies

pip install -r requirements.txt

Checkpoints / Models

1. Initialization Weights (for Training)

Please download the pretrained classification weights to initialize the model:

ResNet-50 | ResNet-101
Swin-B | Swin-L

2. Trained CMIRNet Weights (for Testing)

Download: Baidu Drive (Password: td6n)

Dataset / Benchmark

Images: Download the 2014 Train images from COCO and extract them to ./data/images/.
Referring Expressions: Download RefCOCO, RefCOCO+, RefCOCOg, and RefCLEF from the Official Site.

Usage

Training

# ResNet
python train_resnet.py --model_id cmirnet_refcoco_res --device cuda:0
python train_resnet.py --model_id cmirnet_refcocop_res --device cuda:0 --dataset refcoco+
python train_resnet.py --model_id cmirnet_refcocog_res --device cuda:0 --dataset refcocog --splitBy umd

# Swin
python train_swin.py --model_id cmirnet_refcoco_swin --device cuda:0
python train_swin.py --model_id cmirnet_refcocop_swin --device cuda:0 --dataset refcoco+
python train_swin.py --model_id cmirnet_refcocog_swin --device cuda:0 --dataset refcocog --splitBy umd

Testing

Please ensure that the --resume argument points to the correct checkpoint path:

# ResNet
python test_resnet.py --device cuda:0 --resume path/to/weights
# Swin (note to include --window12)
python test_swin.py --device cuda:0 --resume path/to/weights --window12

Citation

If you use this code or method in your research, please cite our paper:

@ARTICLE{CMIRNet2025TCSVT,
  author={Xu, Mingzhu and Xiao, Tianxiang and Liu, Yutong and Tang, Haoyu and Hu, Yupeng and Nie, Liqiang},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation}, 
  year={2025},
  volume={35},
  number={4},
  pages={3234-3249},
  doi={10.1109/TCSVT.2024.3508752}}

License

This project is released under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
CMIRNet_resnet		CMIRNet_resnet
CMIRNet_swin		CMIRNet_swin
CMIRNet.png		CMIRNet.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation

Authors

Links

Table of Contents

Introduction

Highlights

Project Structure

Installation

1. Clone the repository

2. Install dependencies

Checkpoints / Models

1. Initialization Weights (for Training)

2. Trained CMIRNet Weights (for Testing)

Dataset / Benchmark

Usage

Training

Testing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation

Authors

Links

Table of Contents

Introduction

Highlights

Project Structure

Installation

1. Clone the repository

2. Install dependencies

Checkpoints / Models

1. Initialization Weights (for Training)

2. Trained CMIRNet Weights (for Testing)

Dataset / Benchmark

Usage

Training

Testing

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages