Apply suggestions from code review

sezan92 · web-flow · commit d9c1e11fa3d8 · 2025-02-25T18:00:13.000+09:00
diff --git a/chapters/en/unit3/vision-transformers/vision-transformers-for-image-classification.mdx b/chapters/en/unit3/vision-transformers/vision-transformers-for-image-classification.mdx
@@ -11,7 +11,7 @@ To summarize, in Vision transformer, images are reorganized as 2D grids of patch
 The main idea can be found at the picture below: 
 ![Vision Transformer](https://huggingface.co/datasets/hf-vision/course-assets/blob/main/Screenshot%20from%202024-12-27%2014-25-49.png)
 
-But there is a problem! There are some advantages of using Convolutional Neural Network (CNN)s is that they are designed with some assumptions. They are described in the following section.
+But there is a catch! The Convolutional Neural Networks (CNN)  are designed with an assumption missing in the VT. This assumption is based on how we perceive the objects in the images as humans. It is described in the following section.
 
 ## What are the differences between CNNs and Vision Transformers? 
 
@@ -31,7 +31,7 @@ inductive biases with massive ammount of data!
 
 ### But how can everyone get access to massive datasets?
 
-It's not feasible for everyone to train a Vision Transformer on millions of images to get good performance. Instead, one can use open-sourced models from places such as the [Hugging Face Hub](https://huggingface.co/models?sort=trending).
+It's not feasible for everyone to train a Vision Transformer on millions of images to get good performance. Instead, one can use openly available model weights from places such as the [Hugging Face Hub](https://huggingface.co/models?sort=trending).
 
 What do you do with the pre-trained model? You can apply transfer learning and fine-tune it!