Skip to content

Adding OWLViT/OWLV2 as options for the visual grounding part #55

@skulshreshtha

Description

@skulshreshtha

🚀 Feature

Currently, the project uses GroundingDINO as the visual grounding model which is the best performing model for some benchmark datasets
current benchmarks for zero-shot object detection
We can provide the user flexibility to choose between different visual grounding models like

Motivation & Examples

Tell us why the feature is useful.
Since this project is about text guided segmentation, adding the ability to choose the technique for visual grounding pipeline seems like a natural addition.

Describe what the feature would look like, if it is implemented.
Best demonstrated using code examples in addition to words.

from PIL import Image
from lang_sam import LangSAM

# Initialize and select visual grounding model if desired. Default will be 'groundingdino'. Other options are 'ofa', 'owlvit', and 'owlv2'
model = LangSAM(model = 'groundingdino') 
image_pil = Image.open("./assets/car.jpeg").convert("RGB")
text_prompt = "wheel"
masks, boxes, phrases, logits = model.predict(image_pil, text_prompt)

Note

We only consider adding new features if they are relevant to this library.
Consider if this new feature deserves to be here or should be a new library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions