Skip to content

lst627/COCO-Facet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

COCO-Facet

This repository contains code for the COCO-Facet benchmark for attribute-focused text-to-image retrieval ("Facets" of the images). The benchmark can be downloaded here. Please place the downloaded json files in the "benchmark" folder for evaluation.

Downloading the Images

The annotations are from MSCOCO 2017, COCO-Stuff, Visual7W, and VisDial about COCO images. Since they reindexed the images, we recommend downloading the images at MSCOCO_val2017, VisDial_val2018, Visual7W.

Environment

conda create -n facet python=3.10
pip install -r VLM2Vec/requirements.txt
pip install flash-attn==2.7.4.post1 --no-build-isolation

Evaluation

Please first modify the dataset path and huggingface model path in the scripts. Then you can start evaluation inside the "VLM2Vec" folder.

For CLIP-ViT-L/14-336px:

sh eval_b.sh

For VLM2Vec without any attribute-specific prompt:

sh eval_d.sh

For VLM2Vec with GPT prompts:

sh eval_f.sh

We also attached the human-written prompts in eval_f.py.

For the text-based retrieval:

sh eval_t_detailed.sh

For VLM2Vec with GPT-chosen prompts at test time:

sh eval_e.sh

We have attached the GPT responses under output/outputs_e, which can be reused.

For VLM2Vec with linear approximated promptable embeddings:

sh eval_a.sh

Note that we need the embeddings given by "eval_f.sh" and "eval_d.sh" to derive the matrix W.

We include the collators for other MLLM-based universal multimodal embedders in VLM2Vec/src/collator.py.

Dataset Construction

We attach the dataset construction process in the .ipynb files in the "construction" folder.

Acknowledgment

This code is mainly based on the VLM2Vec repository.

Citation

If you find our code, data, or the paper useful, please cite the paper:

@article{li2025highlighting,
  title={Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval},
  author={Li, Siting and Gao, Xiang and Du, Simon Shaolei},
  journal={arXiv preprint arXiv:2505.15877},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published