COCO-Facet

This repository contains code for the COCO-Facet benchmark for attribute-focused text-to-image retrieval ("Facets" of the images). The benchmark can be downloaded here. Please place the downloaded json files in the "benchmark" folder for evaluation.

Downloading the Images

The annotations are from MSCOCO 2017, COCO-Stuff, Visual7W, and VisDial about COCO images. Since they reindexed the images, we recommend downloading the images at MSCOCO_val2017, VisDial_val2018, Visual7W.

Environment

conda create -n facet python=3.10
pip install -r VLM2Vec/requirements.txt
pip install flash-attn==2.7.4.post1 --no-build-isolation

Evaluation

Please first modify the dataset path and huggingface model path in the scripts. Then you can start evaluation inside the "VLM2Vec" folder.

For CLIP-ViT-L/14-336px:

sh eval_b.sh

For VLM2Vec without any attribute-specific prompt:

sh eval_d.sh

For VLM2Vec with GPT prompts:

sh eval_f.sh

We also attached the human-written prompts in eval_f.py.

For the text-based retrieval:

sh eval_t_detailed.sh

For VLM2Vec with GPT-chosen prompts at test time:

sh eval_e.sh

We have attached the GPT responses under output/outputs_e, which can be reused.

For VLM2Vec with linear approximated promptable embeddings:

sh eval_a.sh

Note that we need the embeddings given by "eval_f.sh" and "eval_d.sh" to derive the matrix W.

We include the collators for other MLLM-based universal multimodal embedders in VLM2Vec/src/collator.py.

Dataset Construction

We attach the dataset construction process in the .ipynb files in the "construction" folder.

Acknowledgment

This code is mainly based on the VLM2Vec repository.

Citation

If you find our code, data, or the paper useful, please cite the paper:

@article{li2025highlighting,
  title={Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval},
  author={Li, Siting and Gao, Xiang and Du, Simon Shaolei},
  journal={arXiv preprint arXiv:2505.15877},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
VLM2Vec		VLM2Vec
construction		construction
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COCO-Facet

Downloading the Images

Environment

Evaluation

Dataset Construction

Acknowledgment

Citation

About

Uh oh!

Releases

Packages

Languages

lst627/COCO-Facet

Folders and files

Latest commit

History

Repository files navigation

COCO-Facet

Downloading the Images

Environment

Evaluation

Dataset Construction

Acknowledgment

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages