Description
Search before asking
- I have searched the Multimodal Maestro issues and found no similar feature requests.
Question
Hi, @skylargivens,
We currently have a good understanding of how to create positive samples for the Florence-2 model, using a format like this:
{
"image": "IMG_20220316_144445_jpg.rf.a79f523e54855af2323f0cfdb9a4dedc.jpg",
"prefix": "<OD>",
"suffix": "5 of hearts<loc_54><loc_213><loc_291><loc_598>6 of hearts<loc_205><loc_251><loc_471><loc_670>7 of hearts<loc_363><loc_309><loc_688><loc_797>8 of hearts<loc_598><loc_395><loc_973><loc_974>"
}
However, I'm unclear on how to properly design negative samples for training. Negative samples are crucial for improving the model's ability to discriminate and reduce false positives. Some questions I have:
- Should negative samples use the same image but with incorrect object descriptions?
- Do we need to use completely unrelated images and descriptions?
- How do we handle the location tags for negative samples?
- What's the recommended ratio of positive to negative samples in the training set?
Any guidance or best practices for creating effective negative samples would be greatly appreciated. This will help ensure we're training the Florence-2 model optimally for object detection tasks.
Additional
If there are any existing resources, documentation, or examples specifically for Florence-2 negative sample creation, please point me in that direction. Also, if there are any tools or scripts the team recommends for generating or augmenting negative samples, that information would be very helpful.