Add BEiT3#2489
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@brianhou0208 on this one, have wanted to have these models but was hoping to avoid another separate vit impl, I feel this could be adapted to the vision_transformer.py, eva.py, or existing beit.py with some slight mods, I think it's
Aside from just code duplication concerns, I'm trying to figure out a way to have all existing vit models adaptable the NaFlex ViTs and I need to constrain the number of architectures I support transformations for ... so vision_transformer.py and eva.py will probably be the highest priorities to support. |
|
Hi @rwightman , Thanks for reviewing this PR. I agree with your suggestions, but I'm not quite sure how to proceed with the integration. |
(CVPR 2023) BEiT-3 is a multimodal model. Although it does not stand out on ImageNet, it achieves impressive results in other domains. Leveraging its powerful pretraining data, it can deliver strong performance on downstream tasks.
Model Issue & Request
Result(ImageNet)
https://github.com/microsoft/unilm/tree/master/beit3#fine-tuning-on-imagenet-1k-image-classification
Note
The performance reported in the paper is based on the Giant model, and the authors do not plan to release its weights.
microsoft/unilm#1031, microsoft/unilm#1382, microsoft/unilm#1435
test code
Reference