This is the official PyTorch implementation of our paper:
QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]
- Python 3.7
- PyTorch 1.7.1
- torchvision 0.8.2
pip install -r requirements.txtYou can find the following files at here.
| File | filename |
|---|---|
| FG & BG VQA results | voc_vqa_fg_blip.npy voc_vqa_bg_blip.npy |
| FG & BG VQA text features | voc_vqa_fg_blip_ViT-L-14_cache.npy voc_vqa_bg_blip_ViT-L-14_cache.npy |
| pre-trained baseline model | res50_cam.pth |
| QA-CLIMS model | res50_qa_clims.pth |
You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy and voc_vqa_bg_blip_ViT-L-14_cache.npy above
and put its in vqa/.
Or, you can generate it by yourself:
To generate VQA results, please follow third_party/README.
After that, run following command to generate VQA text features:
python gen_text_feats_cache.py voc \
--vqa_fg_file vqa/voc_vqa_fg_blip.npy \
--vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \
--vqa_bg_file vqa/voc_vqa_bg_blip.npy \
--vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \
--clip ViT-L/14Please download the pre-trained baseline model res50_cam.pth above and put it at cam-baseline-voc12/res50_cam.pth.
bash run_voc12_qa_clims.shbash run_voc12_sem_seg.shPlease follow deeplab-pytorch or CLIMS.
You can find the following files at here.
| File | filename |
|---|---|
| FG & BG VQA results | coco_vqa_fg_blip.npy coco_vqa_bg_blip.npy |
| FG & BG VQA text features | coco_vqa_fg_blip_ViT-L-14_cache.npy coco_vqa_bg_blip_ViT-L-14_cache.npy |
| pre-trained baseline model | res50_cam.pth |
| QA-CLIMS model | res50_qa_clims.pth |
Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy and coco_vqa_bg_blip_ViT-L-14_cache.npy
in vqa/, and res50_cam.pth in cam-baseline-coco14/.
Then, running the following command:
bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.shIf you find this code useful for your research, please consider cite our paper:
@inproceedings{deng2023qa-clims,
title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={5572--5583},
year={2023}
}
This repository was highly based on CLIMS and IRNet, thanks for their great works!
