-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi everyone,
I’m struggling to understand how you prompted the multimodal LLMs for the super-categories in the paper.
Specifically, did you prompt the model to locate all 122 classes in the “Industrial” category at once, or did you prompt it for each subset separately and then aggregate the mAP across all subsets of a super-category?
For example, for the Construction Equipment subset, did you prompt it like:
"Locate every instance that belongs to the following categories: Bulldozer, Concrete-Mixer, Dump-Truck, Excavator, Lifting-Equipment, Piling-Machine, Tower-Crane. Report bbox coordinates in JSON format"
or did you prompt it to detect all classes in the Industrial category at once (i.e., not limited to the specific classes in the subset)?
Thanks for clarifying!