Hello,
First, congratulations on your excellent work!
I used the ViT-H-14-CLIPA-336-laion2B model from Hugging Face with CLIP Benchmark and obtained higher scores on imagenet1k than those reported in your arXiv paper.
Here are the results I obtained:
| Metric |
Precision |
Recall |
F1-Score |
Support |
| Accuracy |
|
|
0.864 |
10000 |
| Macro Avg |
0.871 |
0.864 |
0.862 |
10000 |
| Weighted Avg |
0.871 |
0.864 |
0.862 |
10000 |
These scores are higher than those reported in the paper.
Do you know what might explain these differences?
Best regards,
Reuben