Hi, great work!
May I ask if there are any recommended prompt templates for using InternVL2.0 to collect image–text descriptions?
Additionally, could you please clarify which InternVL2.0 model size (e.g., 2B, 8B, etc.) was used in your experiments?
Thanks in advance!