Skip to content

iLearn-Lab/CVPR26-ConeSep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

(CVPR 2026) ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

1School of Software, Shandong University Β Β Β 
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Β Β Β 
βœ‰Β Corresponding authorΒ Β 

arXiv Paper page Author Page PyTorch Python stars

Official Repository: A Cone-based robust noise-unlearning network addressing the Noisy Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).

πŸ“Œ Introduction

ConeSep (Cone-based robust noisE-unlearning comPositional network) is our proposed robust learning framework for Composed Image Retrieval. Based on an in-depth analysis of the "Noisy Triplet Correspondence (NTC)" problem, ConeSep systematically resolves three critical challenges overlooked by existing methods: Modality Suppression, Negative Anchor Deficiency, and Unlearning Backlash. By utilizing geometric boundary estimation and optimal transport, ConeSep actively perceives, structurally models, and precisely "unlearns" noise.

⬆ Back to top

πŸ“’ News

  • [2026-04-03] πŸš€ Release all codes of ConeSep!

  • [2026-02-21] πŸ”₯ The paper "ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval" has been accepted by CVPR 2026!.

⬆ Back to top

✨ Key Features

  • πŸ“ Geometric Fidelity Quantization (GFQ): Theoretically establishes and practically estimates a noise boundary utilizing the geometric separability of the cone space, quantifying sample fidelity to overcome "Modality Suppression".
  • πŸ›‘ Negative Boundary Learning (NBL): Resolves the "Negative Anchor Deficiency" by learning a "diagonal negative combination" for each query, serving as an explicit semantic opposite-anchor in the embedding space.
  • 🎯 Boundary-based Targeted Unlearning (BTU): Models the noisy correction process as an optimal transport (OT) problem to elegantly execute precise unlearning while completely avoiding "Unlearning Backlash" on clean samples.
  • πŸ›‘οΈ Unmatched NTC Robustness: Consistently surpasses State-of-the-Art (SOTA) robust CIR models (such as TME, HABIT, and INTENT) under severe Noise Triplet Correspondence (NTC) ratios (0%, 20%, 50%, 80%).

⬆ Back to top

πŸ—οΈ Architecture

ConeSep architecture

Figure 1. ConeSep consists of three logically progressive modules: (a) Geometric Fidelity Quantization (GFQ), (b) Negative Boundary Learning (NBL), and (c) Boundary-based Targeted Unlearning (BTU).

⬆ Back to top

πŸƒβ€β™‚οΈ Experiment-Results

CIR Task Performance

πŸ’‘ Note for Fully-Supervised CIR Benchmarking:
🎯 The 0% noise setting in the tables below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this 0% block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.

FashionIQ:

Table 1. Performance comparison on FashionIQ validation set in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.

CIRR:

Table 2. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.

⬆ Back to top


Table of Contents


πŸ“¦ Install

1. Clone the repository

git clone https://github.com/Lee-zixu/ConeSep
cd ConeSep

2. Setup Python Environment

The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda to create an isolated virtual environment:

conda create -n conesep python=3.8
conda activate conesep

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

Note: Key dependencies include salesforce-lavis for the base architecture (BLIP-2).

⬆ Back to top


πŸ“‚ Data Preparation

We evaluated our framework on two standard datasets: FashionIQ and CIRR. Please download the datasets first.

Click to expand: FashionIQ Dataset Directory Structure

Please follow the official instructions to download the FashionIQ dataset. Once downloaded, ensure the folder structure looks like this:

β”œβ”€β”€ FashionIQ
β”‚   β”œβ”€β”€ captions
β”‚   β”‚   β”œβ”€β”€ cap.dress.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ cap.toptee.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ cap.shirt.[train | val | test].json
β”‚   β”œβ”€β”€ image_splits
β”‚   β”‚   β”œβ”€β”€ split.dress.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ split.toptee.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ split.shirt.[train | val | test].json
β”‚   β”œβ”€β”€ dress
β”‚   β”‚   β”œβ”€β”€ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β”‚   β”œβ”€β”€ shirt
β”‚   β”‚   β”œβ”€β”€ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β”‚   β”œβ”€β”€ toptee
β”‚   β”‚   β”œβ”€β”€ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
Click to expand: CIRR Dataset Directory Structure

Please follow the official instructions to download the CIRR dataset. Once downloaded, ensure the folder structure looks like this:

β”œβ”€β”€ CIRR
β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”œβ”€β”€ [0 | 1 | 2 | ...]
β”‚   β”‚   β”‚   β”œβ”€β”€ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β”‚   β”œβ”€β”€ dev
β”‚   β”‚   β”œβ”€β”€ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β”‚   β”œβ”€β”€ test1
β”‚   β”‚   β”œβ”€β”€ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β”‚   β”œβ”€β”€ cirr
β”‚   β”œβ”€β”€ captions
β”‚   β”‚   β”œβ”€β”€ cap.rc2.[train | val | test1].json
β”‚   β”œβ”€β”€ image_splits
β”‚   β”‚   β”œβ”€β”€ split.rc2.[train | val | test1].json

⬆ Back to top


πŸš€ Quick Start

1. Training under Noisy Settings

In our implementation, we introduce the noise_ratio parameter to simulate varying degrees of NTC (Noise Triplet Correspondence) interference. You can reproduce the experimental results from the paper by modifying the --noise_ratio parameter (default options evaluated are 0.0, 0.2, 0.5, 0.8).

Training on FashionIQ:

python train.py \
    --dataset fashioniq \
    --fashioniq_path "/path/to/FashionIQ/" \
    --model_dir "./checkpoints/fashioniq_noise0.2" \
    --noise_ratio 0.2 \
    --batch_size 256 \
    --num_epochs 20 \
    --lr 2e-5

Training on CIRR:

python train.py \
    --dataset cirr \
    --cirr_path "/path/to/CIRR/" \
    --model_dir "./checkpoints/cirr_noise0.5" \
    --noise_ratio 0.5 \
    --batch_size 256 \
    --num_epochs 20 \
    --lr 1e-5

πŸ’‘ Tips: > - Our model is based on the powerful BLIP-2 architecture. It is highly recommended to run the training on GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G).

  • The best model weights and evaluation metrics generated during training will be automatically saved in the best_model.pt and metrics_best.json files within your specified --model_dir.

2. Testing

To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the following command:

python src/cirr_test_submission.py checkpoints/cirr_noise0.5/

(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)

⬆ Back to top


πŸ“ Project Structure

Our code is deeply customized based on the LAVIS framework. The core implementations are centralized in the following files:

ConeSep/
β”œβ”€β”€ lavis/
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── blip2_models/
β”‚   β”‚       └── ConeSep.py    # 🧠 Core model implementation: Includes GFQ, NBL, BTU modules
β”œβ”€β”€ train.py                  # πŸš€ Training entry point: Controls noise_ratio injection and optimal transport
β”œβ”€β”€ datasets.py 
β”œβ”€β”€ test.py 
β”œβ”€β”€ utils.py 
β”œβ”€β”€ data_utils.py 
β”œβ”€β”€ cirr_test_submission.py   # Auxiliary scripts
β”œβ”€β”€ datasets/                 # Dataset loading and processing logic
└── README.md

🀝 Acknowledgement

The implementation of this project references the LAVIS framework and the noise setting concepts from TME. We express our sincere gratitude to these open-source contributions!

⬆ Back to top

βœ‰οΈ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to us at lizixu.cs@gmail.com

⬆ Back to top

πŸ”— Related Projects

Ecosystem & Other Works from our Team

TEMA
TEMA (ACL'26)
Paper | Web | Code
HABIT
Air-Know (CVPR'26)
Paper | Web | Code
HABIT
HABIT (AAAI'26)
Paper | Web | Code
ReTrack
ReTrack (AAAI'26)
Paper | Web | Code
INTENT
INTENT (AAAI'26)
Paper | Web | Code
HUD
HUD (ACM MM'25)
Paper | Web | Code
OFFSET
OFFSET (ACM MM'25)
Paper | Web | Code
ENCODER
ENCODER (AAAI'25)
Paper | Web | Code

πŸ“β­οΈ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or CitingπŸ“ our paper πŸ₯°. Your support is our greatest motivation!

@InProceedings{ConeSep,
    title={ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval},
    author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Mingyu and Fu, Zhiheng and Nie, Liqiang},
    booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    year = {2026}
}

⬆ Back to top

🫑 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

  • Open an Issue for discussions or bug reports.
  • Submit a Pull Request to improve the codebase.

⬆ Back to top

About

[CVPR 2026] Official repository of CVPR 2026 - ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages