Handling Reduced Visual Tokens During Inference in LLMs

I’m not sure if the LLM can handle a different number of visual tokens than what was used during training. If N visual tokens are discarded, is there a step to adjust the dimension before feeding them into the LLM?