
According to this paragraph, region-based queries are supervised by mini-map down-sampled from the ground truth. If I understand correctly, all the queries then have the same supervision. If so, how can these queries learn to correspond to different regions? Wouldn't they learn the same thing and correspond to the same region?
Can you kindly explain more about this?
According to this paragraph, region-based queries are supervised by mini-map down-sampled from the ground truth. If I understand correctly, all the queries then have the same supervision. If so, how can these queries learn to correspond to different regions? Wouldn't they learn the same thing and correspond to the same region?
Can you kindly explain more about this?