🚀 Feature
Currently, the project uses GroundingDINO as the visual grounding model which is the best performing model for some benchmark datasets

We can provide the user flexibility to choose between different visual grounding models like
Motivation & Examples
Tell us why the feature is useful.
Since this project is about text guided segmentation, adding the ability to choose the technique for visual grounding pipeline seems like a natural addition.
Describe what the feature would look like, if it is implemented.
Best demonstrated using code examples in addition to words.
from PIL import Image
from lang_sam import LangSAM
# Initialize and select visual grounding model if desired. Default will be 'groundingdino'. Other options are 'ofa', 'owlvit', and 'owlv2'
model = LangSAM(model = 'groundingdino')
image_pil = Image.open("./assets/car.jpeg").convert("RGB")
text_prompt = "wheel"
masks, boxes, phrases, logits = model.predict(image_pil, text_prompt)
Note
We only consider adding new features if they are relevant to this library.
Consider if this new feature deserves to be here or should be a new library.
🚀 Feature
Currently, the project uses

GroundingDINOas the visual grounding model which is the best performing model for some benchmark datasetsWe can provide the user flexibility to choose between different visual grounding models like
Motivation & Examples
Tell us why the feature is useful.
Since this project is about text guided segmentation, adding the ability to choose the technique for visual grounding pipeline seems like a natural addition.
Describe what the feature would look like, if it is implemented.
Best demonstrated using code examples in addition to words.
Note
We only consider adding new features if they are relevant to this library.
Consider if this new feature deserves to be here or should be a new library.