For comparison of models, it is valuable to respect pre-defined train/val/test splits when training linear probes. Providing some scripts for ingestion of premade datasets would facilitate efficient model comparison and trying models on new benchmarks easily.