Skip to content

iLearn-Lab/TCSVT25-CMIRNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation

Official PyTorch implementation of the IEEE TCSVT 2024 paper "CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation".

Authors

Mingzhu Xu1, Tianxiang Xiao1, Yutong Liu1, Haoyu Tang1*, Yupeng Hu1, Liqiang Nie2

1 Shandong University
2 Harbin Institute of Technology (Shen Zhen)
* Corresponding author

Links


Table of Contents


Introduction

This project is the official implementation of the paper "CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation".

CMIRNet aims to address the challenge of cross-modal alignment in the task of Referring Image Segmentation:

  • Problem Addressed:How to effectively perform deep interaction and logical reasoning between visual features and natural language descriptions.
  • Core Idea:A Cross-Modal Interactive Reasoning Network (CMIRNet) is proposed, which leverages architectures such as Graph Neural Networks (GNNs) to enhance semantic correlations across modalities.
  • This Repository Provides:Complete training and testing code supporting both ResNet and Swin Transformer backbones.

the pipeline of CMIRNet


Highlights

  • Proposes a Cross-Modal Interactive Reasoning mechanism to enhance alignment between vision and language.
  • Supports multiple backbone architectures (ResNet-50/101 and Swin Transformer Base/Large).
  • Achieves strong performance on benchmark datasets (RefCOCO, RefCOCO+, RefCOCOg, RefCLEF).

Project Structure

.
├── data/                  # Stores images and annotation data
├── train_resnet.py        # Training script based on ResNet
├── train_swin.py          # Training script based on Swin Transformer
├── test_resnet.py         # Testing script based on ResNet
├── test_swin.py           # Testing script based on Swin Transformer
├── README.md
└── requirements.txt

Installation

1. Clone the repository

git clone [https://github.com/iLearn-Lab/CMIRNet.git](https://github.com/iLearn-Lab/CMIRNet.git)
cd CMIRNet

2. Install dependencies

pip install -r requirements.txt

Checkpoints / Models

1. Initialization Weights (for Training)

Please download the pretrained classification weights to initialize the model:

2. Trained CMIRNet Weights (for Testing)


Dataset / Benchmark

  1. Images: Download the 2014 Train images from COCO and extract them to ./data/images/.
  2. Referring Expressions: Download RefCOCO, RefCOCO+, RefCOCOg, and RefCLEF from the Official Site.

Usage

Training

# ResNet
python train_resnet.py --model_id cmirnet_refcoco_res --device cuda:0
python train_resnet.py --model_id cmirnet_refcocop_res --device cuda:0 --dataset refcoco+
python train_resnet.py --model_id cmirnet_refcocog_res --device cuda:0 --dataset refcocog --splitBy umd

# Swin
python train_swin.py --model_id cmirnet_refcoco_swin --device cuda:0
python train_swin.py --model_id cmirnet_refcocop_swin --device cuda:0 --dataset refcoco+
python train_swin.py --model_id cmirnet_refcocog_swin --device cuda:0 --dataset refcocog --splitBy umd

Testing

Please ensure that the --resume argument points to the correct checkpoint path:

# ResNet
python test_resnet.py --device cuda:0 --resume path/to/weights
# Swin (note to include --window12)
python test_swin.py --device cuda:0 --resume path/to/weights --window12

Citation

If you use this code or method in your research, please cite our paper:

@ARTICLE{CMIRNet2025TCSVT,
  author={Xu, Mingzhu and Xiao, Tianxiang and Liu, Yutong and Tang, Haoyu and Hu, Yupeng and Nie, Liqiang},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation}, 
  year={2025},
  volume={35},
  number={4},
  pages={3234-3249},
  doi={10.1109/TCSVT.2024.3508752}}

License

This project is released under the Apache License 2.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages