Skip to content

Hot dog classifier, uses fine tuned ViT model. Model's size too large for GitHub

Notifications You must be signed in to change notification settings

dmytroyelchaninov/is_hot_dog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

The script fine-tunes a Vision Transformer (ViT) to spot if a hot dog is in the image, hitting a 97% accuracy, while CNNs like VGG and MobileNet max out around 85% in the same time. ViT rocks because it looks at the whole image at once, catching details CNNs miss with their narrow focus on local areas. The key idea is to treat images like sequences, using transformer attention to pull out global features more effectively. That extra awareness is why ViT crushes it for tasks like this!

For training, I used Google Colab as it provides access to CUDA cores, which are essential for speeding up the fine-tuning process of Vision Transformer (ViT). ViT models require significant computational power due to their self-attention mechanism, and leveraging GPU resources ensures faster and more efficient training.

About

Hot dog classifier, uses fine tuned ViT model. Model's size too large for GitHub

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published