FastViT is an efficient hybrid ViT architecture that attains state-of-the-art accuracy to latency tradeoff.
We provide training and evaluation code of FastVit, along with pretrained models and configuration files for image classification on the imagenet dataset.
Single node 8 A100 GPU training of FastVit-T8 model can be done using below command:
export CFG_FILE=projects/fastvit/classification/fastvit_t8_in1k.yaml
corenet-train --common.config-file $CFG_FILE --common.results-loc classification_resultsNote: Do not forget to change the training and validation dataset locations in configuration files.
We evaluate the model on a single GPU using following command:
export CFG_FILE=projects/fastvit/classification/fastvit_t8_in1k.yaml
export MODEL_WEIGHTS="https://docs-assets.developer.apple.com/ml-research/models/corenet/v0.1.0/fastvit/imagenet-1k/fastvit-t8.pt"
export DATASET_PATH="/mnt/vision_datasets/imagenet/validation/" # change to the ImageNet validation path
CUDA_VISIBLE_DEVICES=0 corenet-eval --common.config-file $CFG_FILE --model.classification.pretrained $MODEL_WEIGHTS --common.override-kwargs dataset.root_val=$DATASET_PATHThis should give
top1=76.284 || top5=93.244
If you find the work useful, please cite following papers:
@inproceedings{vasufastvit2023,
author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
title = {FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year = {2023}
}
@inproceedings{mehta2022cvnets,
author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad},
title = {CVNets: High Performance Library for Computer Vision},
year = {2022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
series = {MM '22}
}