Hello,
Just a note: the VTAB-1k datasets were extracted from the original train, val and test sets using a lossy JPEG format corrupting the data, which may affect fine-tuning performance for some models. For example, for CLIP/SIGLIP, even performing zero-shot classification on the provided CIFAR100 test set shows a 20 % drop in accuracy.