How does it compare to the DINOV2 model?

This is a very interesting job. As far as I know, many people are currently using the DINOV2 model for feature extraction from reference images. I would like to ask if there have been any comparative experiments with DINOV2? What would the results be like?