DINOv2 is a self-supervised vision transformer model by Meta that learns high-quality image representations without needing labeled data. It builds on the success of DINO by introducing architectural and training enhancements that deliver state-of-the-art performance across various computer vision tasks, including classification.
For detailed instructions on using our DINOv2 implementation, visit its model page in our documentation.