Skip to content

Commit d9c1e11

Browse files
authored
Apply suggestions from code review
1 parent 86a7d8e commit d9c1e11

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

chapters/en/unit3/vision-transformers/vision-transformers-for-image-classification.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ To summarize, in Vision transformer, images are reorganized as 2D grids of patch
1111
The main idea can be found at the picture below:
1212
![Vision Transformer](https://huggingface.co/datasets/hf-vision/course-assets/blob/main/Screenshot%20from%202024-12-27%2014-25-49.png)
1313

14-
But there is a problem! There are some advantages of using Convolutional Neural Network (CNN)s is that they are designed with some assumptions. They are described in the following section.
14+
But there is a catch! The Convolutional Neural Networks (CNN) are designed with an assumption missing in the VT. This assumption is based on how we perceive the objects in the images as humans. It is described in the following section.
1515

1616
## What are the differences between CNNs and Vision Transformers?
1717

@@ -31,7 +31,7 @@ inductive biases with massive ammount of data!
3131

3232
### But how can everyone get access to massive datasets?
3333

34-
It's not feasible for everyone to train a Vision Transformer on millions of images to get good performance. Instead, one can use open-sourced models from places such as the [Hugging Face Hub](https://huggingface.co/models?sort=trending).
34+
It's not feasible for everyone to train a Vision Transformer on millions of images to get good performance. Instead, one can use openly available model weights from places such as the [Hugging Face Hub](https://huggingface.co/models?sort=trending).
3535

3636
What do you do with the pre-trained model? You can apply transfer learning and fine-tune it!
3737

0 commit comments

Comments
 (0)