Skip to content

Some problems when using RICE-ViT #8

@zhangym0213

Description

@zhangym0213

Hello @anxiangsir 🤗

I'm Yiming and study in NTU. Recently I’ve been working with RICE-ViT and trying to reproduce baseline built on Qwen2.5-7B-Instruct. I ran into a couple of questions and would really appreciate your help:

About reproducing ViT-L-14-336px results

I used rice-vit-large-patch14-560 and modify the crop_size and shortest_edge in preprocessor_config to 336, attempting to match the ViT-L-14-336px setup. Is this the correct way to reproduce the 336px version? If not, where can I find the checkpoint specifically trained for ViT-L-14-336px?

Which MLCDVisionModel to use

I noticed that there are two version of MLCDVisionModel,

For RICE-ViT, I used the version from Transformers.
Is this the correct choice?

Thanks a lot for your time! 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions