You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m not sure if the LLM can handle a different number of visual tokens than what was used during training. If N visual tokens are discarded, is there a step to adjust the dimension before feeding them into the LLM?