Skip to content

[TPAMI2025] Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations

Notifications You must be signed in to change notification settings

visionxiang/ZSCOS-CaMF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[TPAMI] Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations


Cheng Lei,   Jie Fan,   Xinran Li,   Tianzhu Xiang,   Ao Li,   Ce Zhu,   Le Zhang,  
University of Electronic Science and Technology of China;   Space42, UAE  

Paper  

Abstract: Camouflaged Object Segmentation (COS) faces significant challenges due to the scarcity of annotated data, where meticulous pixel-level annotation is both labor-intensive and costly, primarily due to the intricate object-background boundaries. Addressing the core question, "Can COS be effectively achieved in a zero-shot manner without manual annotations for any camouflaged object?", we propose an affirmative solution. We examine the learned attention patterns for camouflaged objects and introduce a robust zero-shot COS framework. Our findings reveal that while transformer models for salient object segmentation (SOS) prioritize global features in their attention mechanisms, camouflaged object segmentation exhibits both global and local attention biases. Based on these findings, we design a framework that adapts with the inherent local pattern bias of COS while incorporating global attention patterns and a broad semantic feature space derived from SOS. This enables efficient zero-shot transfer for COS. Specifically, We incorporate a Masked Image Modeling (MIM) based image encoder optimized for Parameter-Efficient Fine-Tuning (PEFT), a Multimodal Large Language Model (M-LLM), and a Multi-scale Fine-grained Alignment (MFA) mechanism. The MIM encoder captures essential local features, while the PEFT module learns global and semantic representations from SOS datasets. To further enhance semantic granularity, we leverage the M-LLM to generate caption embeddings conditioned on visual cues, which are meticulously aligned with multi-scale visual features via MFA. This alignment enables precise interpretation of complex semantic contexts. Moreover, we introduce a learnable codebook to represent the M-LLM during inference, significantly reducing computational demands while maintaining performance. Our framework demonstrates its versatility and efficacy through rigorous experimentation, achieving state-of-the-art performance in zero-shot COS with $F_{\beta}^w$ scores of 72.9\% on CAMO and 71.7\% on COD10K. By removing the M-LLM during inference, we achieve an inference speed comparable to that of traditional end-to-end models, reaching 18.1 FPS. Additionally, our method excels in polyp segmentation, and underwater scene segmentation, outperforming challenging baselines in both zero-shot and supervised settings, thereby implying its potentiality in various segmentation tasks. The source code will be made available at \url{https://github.com/AVC2-UESTC/ZSCOS-CaMF}.

alt text


Install

For setup, refer to the Quick Start guide for a fast setup, or follow the detailed instructions below for a step-by-step configuration.

Pytorch

The code requires python>=3.9, as well as pytorch>=2.0.0. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

MMCV

Please install MMCV following the instructions here.

xFormers

Please install xFormers following the instructions here.

Other Dependencies

Please install the following dependencies:

pip install -r requirements.txt

Model Weights

Pretrained Weights

You can download the pretrained weights eva02_L_pt_m38m_p14to16.pt from EVA02 or here.

Run the following command to convert the PyTorch weights to the format used in this repository.

python convert_pt_weights.py 

For training, put the converted weights in the model_weights folder.

Fine-tuned Weights

Method Dataset Weights Configs
CAMF-ZS DUTS camf_duts.pth config
CAMF-S CAMO+COD10K camf_cod.pth

For testing, put the pretrained weights and fine-tuned weights in the model_weights folder.


Dataset

The following datasets are used in this paper:


Quick Start

Environment Setup

Make sure cuda 11.8 is installed in your virtual environment. Linux is recommmended.

Install pytorch

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118

Install xformers

pip install xformers==0.0.22 --index-url https://download.pytorch.org/whl/cu118

# test installation (optional)
python -m xformers.info

Install mmcv

pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.4/index.html

Other dependencies

pip install -r requirements.txt

Prepare Dataset

We follow the ADE20K dataset format. Organize your dataset files as follows:

./datasets/dataset_name/

├── images/
│   ├── training/       # Put training images here
│   └── validation/     # Put validation images here
└── annotations/
    ├── training/       # Put training segmentation maps here 
    └── validation/     # Put validation segmentation maps here 

Test

Put the model weights into the model_weights folder, and run the following command to test the model.

python test.py

Train

Preparing

Debug

If you want to debug the code, ckeck train_debug.py and test_debug.py.


Citation

If you find the code helpful in your research or work, please cite the following paper:

@article{lei2024towards,
  title={Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations},
  author={Lei, Cheng and Fan, Jie and Li, Xinran and Xiang, Tianzhu and Li, Ao and Zhu, Ce and Zhang, Le},
  journal={arXiv preprint arXiv:2410.16953},
  year={2024}
}

Acknowledgement

This project is based on MMCV, timm, EVA02, MAM, and EVP. We thank the authors for their valuable contributions.

About

[TPAMI2025] Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%