-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Hi thanks for your great work, exploring BEIT as an alternative to CLIP.
I find it very well motivated in the paper, but I struggle to reproduce the BEIT3 results in my independent training codebase.
So far I can match / surpass clip results, and the addition of CLIP_Image in Late Concat is beneficial.
However, so far BEIT3 underperforms clip. So I'm wondering if I am missing something.
For your BEIT experiments, what do you mean by Late Concat and Early(L1-L12), Early(L1-L24)? I can't find reference to this in the code, and neither in the beit repo or torchscale repo. If you could share a code sample you would really help to articulate your point
Thank you for your time
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels