Unified Open-World Segmentation with Multi-Modal Prompts

Yang Liu^1*, Yufei Yin^2*, Chenchen Jing³, Muzhi Zhu¹, Hao Chen¹, Yuling Xi¹, Bo Feng⁴, Hao Wang⁴, Shiyu Li⁴, Chunhua Shen¹

¹Zhejiang University, ²Hangzhou Dianzi University, ³Zhejiang University of Technology, ⁴Apple

🚀 Overview

📖 Description

In this work, we present COSINE, a unified open-world segmentation model that consolidates open-vocabulary segmentation and in-context segmentation with multi-modal prompts (e.g. text and image). COSINE exploits foundation models to extract representations for an input image and corresponding multi-modal prompts, and a SegDecoder to align these representations, model their interaction, and obtain masks specified by input prompts across different granularities. In this way, COSINE overcomes architectural discrepancies, divergent learning objectives, and distinct representation learning strategies of previous pipelines for open-vocabulary segmentation and in-context segmentation. Comprehensive experiments demonstrate that COSINE has significant performance improvements in both open-vocabulary and in-context segmentation tasks. Our exploratory analyses highlight that the synergistic collaboration between using visual and textual prompts leads to significantly improved generalization over single-modality approaches.

🎫 License

For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figs		figs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Open-World Segmentation with Multi-Modal Prompts

🚀 Overview

📖 Description

🎫 License

About

Uh oh!

Releases

Packages

License

aim-uofa/COSINE

Folders and files

Latest commit

History

Repository files navigation

Unified Open-World Segmentation with Multi-Modal Prompts

🚀 Overview

📖 Description

🎫 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages