1- [ ![ PWC] ( https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcocog )] ( https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=multi-label-cluster-discrimination-for-visual )
1+ <a href =" https://arxiv.org/pdf/2407.17331 " ><img src =" https://img.shields.io/badge/arXiv-2407.17331-b31b1b " alt =" arXiv " ></a >
2+ <a href =' https://huggingface.co/DeepGlint-AI/MLCD-Seg ' ><img src =' https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-green ' ></a >
3+ </div >
4+
5+
6+
7+ ## Example:
8+
9+ ![ output] ( https://github.com/user-attachments/assets/85c023a1-3e0c-4ea5-a764-1eb9ee0fbddf )
10+
11+
12+ ## RefCOCO Segmentation Evaluation Results:
13+
14+ [ ![ PWC] ( https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcocog )] ( https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=multi-label-cluster-discrimination-for-visual )
215[ ![ PWC] ( https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-5 )] ( https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-5?p=multi-label-cluster-discrimination-for-visual )
316[ ![ PWC] ( https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-3 )] ( https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-3?p=multi-label-cluster-discrimination-for-visual )
417[ ![ PWC] ( https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcocog-1 )] ( https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog-1?p=multi-label-cluster-discrimination-for-visual )
922[ ![ PWC] ( https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco )] ( https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=multi-label-cluster-discrimination-for-visual )
1023
1124
12- # MLCD-Seg
13- [ ![ Hugging Face] ( https://img.shields.io/badge/Hugging%20Face-MLCD_SEG_Model-yellow )] ( https://huggingface.co/DeepGlint-AI/MLCD-Seg-7B )
14-
15- This repository is dedicated to researching the application of multimodal large models in downstream tasks through an end-to-end approach. At present, the segmentation part has achieved excellent results in the reference segmentation project
16-
17-
18- ## RefCOCO Segmentation Evaluation:
1925
2026| Dataset | Split | MLCD-seg-7B | EVF-SAM | GLaMM | VisionLLM v2| LISA |
2127| :-- | :-: | :-: | :-: | :-: | :-: | :-: |
@@ -28,20 +34,74 @@ This repository is dedicated to researching the application of multimodal large
2834| RefCOCOg | val | ** 79.7** | 78.2 | 74.2 | 73.3 | 67.9 |
2935| RefCOCOg | test | ** 80.5** | 78.3 | 74.9 | 74.8 | 70.6 |
3036
31- ---
32- ## Evaluation
33- Install the evaluation tool and execute the evaluation script:
37+ ## How to use:
38+
39+ If you just want to use this code, please refer to this sample below
40+ ``` python
41+ model_path = " DeepGlint-AI/MLCD-Seg" # or use your local path
42+ mlcd_seg = AutoModel.from_pretrained(
43+ model_path,
44+ torch_dtype = torch.float16,
45+ trust_remote_code = True
46+ ).cuda()
47+ tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast = False )
48+ # Assuming you have an image named test.jpg
49+ seg_img = Image.open(" asserts/example.jpg" ).convert(' RGB' )
50+ seg_prompt = " Could you provide a segmentation mask for the right giraffe in this image?"
51+ pred_mask = model.seg(seg_img, seg_prompt, tokenizer, force_seg = False )
52+ ```
53+
54+ If you want to use this code measurement dataset (e.g. refcoco), then you need to use the following method
55+ ``` python
56+ model_path = " DeepGlint-AI/MLCD-Seg" # or use your local path
57+ mlcd_seg = AutoModel.from_pretrained(
58+ model_path,
59+ torch_dtype = torch.float16,
60+ trust_remote_code = True
61+ ).cuda()
62+ tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast = False )
63+ # Assuming you have an image named test.jpg
64+ seg_img = Image.open(" asserts/example.jpg" ).convert(' RGB' )
65+ seg_prompt = " Could you provide a segmentation mask for the right giraffe in this image?"
66+ pred_mask = model.seg(seg_img, seg_prompt, tokenizer, force_seg = True )
67+ ```
68+
69+ ## Intstallation
70+
3471``` bash
35- bash ./eval/scripts/eval_refcoco.sh
72+ # Create environment from file
73+ conda create -n mlcd_seg python=3.10
74+ conda activate mlcd_seg
75+
76+ pip install -r requirements.txt
3677```
37- ---
78+
79+
80+ ## Docker
81+ ``` bash
82+ # PyTorch Docker
83+
84+ ` ` ` bash
85+ # Build the Docker image
86+ docker build -t mlcd_seg .
87+
88+ # Run the Docker container with GPU support
89+ docker run -it --rm --gpus all mlcd_seg bash
90+ ` ` `
91+
3892
3993# # Citations
4094```
4195@misc {mlcdseg_wukun,
42- author = {Wu, Kun and Xie, Yin and Zhou, Xinyu and An, Xiang, and Deng, Jiankang},
43- title = {MLCD-seg-7B},
44- year = {2024},
45- url = {https://github.com/deepglint/unicom/tree/main/downstream},
96+ author = {Wu, Kun and Xie, Yin and Jie, Yu and Zhou, Xinyu and An, Xiang, Feng, Ziyong and Deng, Jiankang},
97+ title = {MLCD-Seg},
98+ year = {2025},
99+ url = {https://github.com/deepglint/MLCD_SEG} ,
100+ }
101+ @inproceedings {anxiang_2024_mlcd,
102+ title={Multi-label Cluster Discrimination for Visual Representation Learning},
103+ author={An, Xiang and Yang, Kaicheng and Dai, Xiangzi and Feng, Ziyong and Deng, Jiankang},
104+ booktitle={ECCV},
105+ year={2024}
46106}
47107```
0 commit comments