Hi, @linyq17
I found a minor bug in the code regarding the learnable queries. I noticed that you are using a special token named <|object_ref_start|> as a placeholder for the learnable queries, and subsequently extracting the last num_queries tokens at the tail end:
hidden_states = checkpoint_output[-1]
learnable_query_features = hidden_states[:,-input_query.shape[0]:,:]
However, I noticed that the last token was not a placeholder <|object_ref_start|>.
ipdb> inputs.input_ids[0,-1]
tensor(198)
Hi, @linyq17
I found a minor bug in the code regarding the learnable queries. I noticed that you are using a special token named
<|object_ref_start|>as a placeholder for the learnable queries, and subsequently extracting the lastnum_queriestokens at the tail end:However, I noticed that the last token was not a placeholder
<|object_ref_start|>.