(CVPR 2026) ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Β Β Β
βΒ Corresponding authorΒ Β
Official Repository: A Cone-based robust noise-unlearning network addressing the Noisy Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).
ConeSep (Cone-based robust noisE-unlearning comPositional network) is our proposed robust learning framework for Composed Image Retrieval. Based on an in-depth analysis of the "Noisy Triplet Correspondence (NTC)" problem, ConeSep systematically resolves three critical challenges overlooked by existing methods: Modality Suppression, Negative Anchor Deficiency, and Unlearning Backlash. By utilizing geometric boundary estimation and optimal transport, ConeSep actively perceives, structurally models, and precisely "unlearns" noise.
-
[2026-04-03] π Release all codes of ConeSep!
-
[2026-02-21] π₯ The paper "ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval" has been accepted by CVPR 2026!.
- π Geometric Fidelity Quantization (GFQ): Theoretically establishes and practically estimates a noise boundary utilizing the geometric separability of the cone space, quantifying sample fidelity to overcome "Modality Suppression".
- π Negative Boundary Learning (NBL): Resolves the "Negative Anchor Deficiency" by learning a "diagonal negative combination" for each query, serving as an explicit semantic opposite-anchor in the embedding space.
- π― Boundary-based Targeted Unlearning (BTU): Models the noisy correction process as an optimal transport (OT) problem to elegantly execute precise unlearning while completely avoiding "Unlearning Backlash" on clean samples.
- π‘οΈ Unmatched NTC Robustness: Consistently surpasses State-of-the-Art (SOTA) robust CIR models (such as TME, HABIT, and INTENT) under severe Noise Triplet Correspondence (NTC) ratios (0%, 20%, 50%, 80%).
Table 1. Performance comparison on FashionIQ validation set in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined. Table 2. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.π‘ Note for Fully-Supervised CIR Benchmarking:
π― The 0% noise setting in the tables below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this0%block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.
- Introduction
- News
- Key Features
- Architecture
- Experiment Results
- Install
- Data Preparation
- Quick Start
- Project Structure
- Acknowledgement
- Contact
- Citation
- Support & Contributing
1. Clone the repository
git clone https://github.com/Lee-zixu/ConeSep
cd ConeSep2. Setup Python Environment
The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda to create an isolated virtual environment:
conda create -n conesep python=3.8
conda activate conesep
# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# Install core dependencies
pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16Note: Key dependencies include
salesforce-lavisfor the base architecture (BLIP-2).
We evaluated our framework on two standard datasets: FashionIQ and CIRR. Please download the datasets first.
Click to expand: FashionIQ Dataset Directory Structure
Please follow the official instructions to download the FashionIQ dataset. Once downloaded, ensure the folder structure looks like this:
βββ FashionIQ
β βββ captions
β β βββ cap.dress.[train | val | test].json
β β βββ cap.toptee.[train | val | test].json
β β βββ cap.shirt.[train | val | test].json
β βββ image_splits
β β βββ split.dress.[train | val | test].json
β β βββ split.toptee.[train | val | test].json
β β βββ split.shirt.[train | val | test].json
β βββ dress
β β βββ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β βββ shirt
β β βββ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β βββ toptee
β β βββ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
Click to expand: CIRR Dataset Directory Structure
Please follow the official instructions to download the CIRR dataset. Once downloaded, ensure the folder structure looks like this:
βββ CIRR
β βββ train
β β βββ [0 | 1 | 2 | ...]
β β β βββ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β βββ dev
β β βββ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β βββ test1
β β βββ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β βββ cirr
β βββ captions
β β βββ cap.rc2.[train | val | test1].json
β βββ image_splits
β β βββ split.rc2.[train | val | test1].json
In our implementation, we introduce the noise_ratio parameter to simulate varying degrees of NTC (Noise Triplet Correspondence) interference. You can reproduce the experimental results from the paper by modifying the --noise_ratio parameter (default options evaluated are 0.0, 0.2, 0.5, 0.8).
Training on FashionIQ:
python train.py \
--dataset fashioniq \
--fashioniq_path "/path/to/FashionIQ/" \
--model_dir "./checkpoints/fashioniq_noise0.2" \
--noise_ratio 0.2 \
--batch_size 256 \
--num_epochs 20 \
--lr 2e-5Training on CIRR:
python train.py \
--dataset cirr \
--cirr_path "/path/to/CIRR/" \
--model_dir "./checkpoints/cirr_noise0.5" \
--noise_ratio 0.5 \
--batch_size 256 \
--num_epochs 20 \
--lr 1e-5π‘ Tips: > - Our model is based on the powerful BLIP-2 architecture. It is highly recommended to run the training on GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G).
- The best model weights and evaluation metrics generated during training will be automatically saved in the
best_model.ptandmetrics_best.jsonfiles within your specified--model_dir.
To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the following command:
python src/cirr_test_submission.py checkpoints/cirr_noise0.5/(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)
Our code is deeply customized based on the LAVIS framework. The core implementations are centralized in the following files:
ConeSep/
βββ lavis/
β βββ models/
β β βββ blip2_models/
β β βββ ConeSep.py # π§ Core model implementation: Includes GFQ, NBL, BTU modules
βββ train.py # π Training entry point: Controls noise_ratio injection and optimal transport
βββ datasets.py
βββ test.py
βββ utils.py
βββ data_utils.py
βββ cirr_test_submission.py # Auxiliary scripts
βββ datasets/ # Dataset loading and processing logic
βββ README.md
The implementation of this project references the LAVIS framework and the noise setting concepts from TME. We express our sincere gratitude to these open-source contributions!
For any questions, issues, or feedback, please open an issue on GitHub or reach out to us at lizixu.cs@gmail.com
Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Paper | Web | Code |
![]() Air-Know (CVPR'26) Paper | Web | Code |
![]() HABIT (AAAI'26) Paper | Web | Code |
![]() ReTrack (AAAI'26) Paper | Web | Code |
![]() INTENT (AAAI'26) Paper | Web | Code |
![]() HUD (ACM MM'25) Paper | Web | Code |
![]() OFFSET (ACM MM'25) Paper | Web | Code |
![]() ENCODER (AAAI'25) Paper | Web | Code |
If you find our work or this code useful in your research, please consider leaving a StarβοΈ or Citingπ our paper π₯°. Your support is our greatest motivation!
@InProceedings{ConeSep,
title={ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval},
author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Mingyu and Fu, Zhiheng and Nie, Liqiang},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
year = {2026}
}We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:
- Open an Issue for discussions or bug reports.
- Submit a Pull Request to improve the codebase.











