Skip to content

BEIT-3-Large - Layer fusion #38

@MarcoForte

Description

@MarcoForte

Hi thanks for your great work, exploring BEIT as an alternative to CLIP.

I find it very well motivated in the paper, but I struggle to reproduce the BEIT3 results in my independent training codebase.
So far I can match / surpass clip results, and the addition of CLIP_Image in Late Concat is beneficial.

However, so far BEIT3 underperforms clip. So I'm wondering if I am missing something.

For your BEIT experiments, what do you mean by Late Concat and Early(L1-L12), Early(L1-L24)? I can't find reference to this in the code, and neither in the beit repo or torchscale repo. If you could share a code sample you would really help to articulate your point

Thank you for your time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions