-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello @anxiangsir 🤗
I'm Yiming and study in NTU. Recently I’ve been working with RICE-ViT and trying to reproduce baseline built on Qwen2.5-7B-Instruct. I ran into a couple of questions and would really appreciate your help:
About reproducing ViT-L-14-336px results
I used rice-vit-large-patch14-560 and modify the crop_size and shortest_edge in preprocessor_config to 336, attempting to match the ViT-L-14-336px setup. Is this the correct way to reproduce the 336px version? If not, where can I find the checkpoint specifically trained for ViT-L-14-336px?
Which MLCDVisionModel to use
I noticed that there are two version of MLCDVisionModel,
-
one in LLaVA-NEXT,
-
another in transformers
For RICE-ViT, I used the version from Transformers.
Is this the correct choice?
Thanks a lot for your time! 🙏
Metadata
Metadata
Assignees
Labels
No labels