Skip to content

Commit 4a8189d

Browse files
committed
Edit readme
1 parent 128a12b commit 4a8189d

2 files changed

Lines changed: 22 additions & 18 deletions

File tree

08-multimodal-multimodel/README.md

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,17 @@
33
Let's learn about two concepts you'll hear a lot in modern AI — **multimodal** and **multi-model**.
44
They sound almost the same, so let's not get confused!
55

6-
![Multimodal and Multi-model AI](../images/multimodal-multimodel.png)
6+
![Multimodal and Multi-model AI](../images/sketchnote-multimodal-multimodel.png)
77

88
## 🧿 Multimodal
99

1010
### What Is Multimodality?
1111

12-
**Multimodality** means an AI model can understand and generate *multiple types of data* and combine them to reason about the world.
12+
**Multimodality** means an AI system can understand and generate *multiple types of data* and combine them to reason about the world.
1313

14-
### Multimodal model
14+
### Multimodal Model
1515

16-
When you hear a **multimodal model**, it means a *single AI model* that works across:
16+
A **multimodal model** is a *single AI model* that works across:
1717

1818
- 📝 Text
1919
- 🖼️ Images
@@ -22,8 +22,7 @@ When you hear a **multimodal model**, it means a *single AI model* that works ac
2222
- 💻 Code
2323
etc.
2424

25-
Think of it as **one brain with multiple senses** — all working together.
26-
The model can read an image, interpret the text inside it, listen to accompanying audio, and respond in natural language, all in one flow.
25+
Think of it as **one brain with multiple senses** — all working together. The model can read an image, interpret the text inside it, listen to accompanying audio, and respond in natural language, all in one flow.
2726

2827
### When multimodal model approaches shine
2928
- Simple, end-to-end tasks
@@ -33,9 +32,9 @@ The model can read an image, interpret the text inside it, listen to accompanyin
3332

3433
## 👯 Multi-model
3534

36-
## What Is the Multi-model Approach?
35+
### What Is the Multi-model Approach?
3736

38-
A **multimodel** system uses **multiple specialized AI models**, each designed for a specific task.
37+
A **multimodel** system uses *multiple specialized AI models*, each designed for a specific task.
3938

4039
Examples:
4140
- 👁️ A vision model for image understanding
@@ -69,7 +68,7 @@ This is like having a **team of experts**, each doing what they're best at.
6968
- ❌ Requires more engineering
7069
- ❌ More points of failure
7170

72-
## 🧜‍♀️ Hybrid Approaches (Often the Sweet Spot)
71+
**Hybrid Approaches** (Often the Sweet Spot 🧜‍♀️ )
7372

7473
In many real applications, you'll mix both:
7574

@@ -79,19 +78,23 @@ In many real applications, you'll mix both:
7978
This gives you a balance of accuracy, cost efficiency, and flexibility.
8079

8180

82-
## 🧪 Example in the video: Multi-model Approach
81+
## 🧪 Example in the video
8382

84-
**Scenario:**
85-
You're traveling in Japan, sitting at a restaurant for lunch.
86-
You take a photo of the menu and ask your app:
83+
**👩 User scenario:**
8784

88-
> "Can you suggest gluten-free meals from this menu?"
85+
A user is traveling. In this case, Japan, and sitting at a local diner for lunch. They don't have a munu in English, so the user takes a photo of the menu, uploads it to the AI-powered app, then asks:
86+
87+
> 👩 "Can you suggest gluten-free meals from this menu?"
8988
9089
![Diner in Japan](../images/japan-diner.png)
9190

92-
**How the app handles it:**
91+
The app suggests *Yasai-itame teishoku* (stir-fried vegetable set) from the menu.
92+
93+
**📱 App scenario:**
94+
95+
The app needs to handle multimodality.
9396

94-
The app orchestrates three specialized models, each doing what it's best at:
97+
In this case, the app orchestrates three specialized models, each doing what it's best at:
9598

9699
1. **OCR Model** extracts the text from the menu image — including Japanese characters, prices, dish names, and descriptions.
97100
1. **Translation Model** translates the extracted text into English (or the user's preferred language) with high linguistic accuracy.
@@ -107,6 +110,7 @@ The app orchestrates three specialized models, each doing what it's best at:
107110

108111
Watch the video, **Multimodal and Multi-model AI** on YouTube:
109112

110-
[![YouTube: Multimodal and Multi-model AI](https://img.youtube.com/vi/zkZYeYvBy60/0.jpg)]([https://www.youtube.com/watch?v=zkZYeYvBy60](https://www.youtube.com/watch?v=zkZYeYvBy60))
113+
[![YouTube: Multimodal and Multi-model AI](https://img.youtube.com/vi/zkZYeYvBy60/0.jpg)](https://www.youtube.com/watch?v=zkZYeYvBy60)
114+
https://youtu.be/zkZYeYvBy60
111115

112-
[Subscribe us!](https://www.youtube.com/channel/UCV_6HOhwxYLXAGd-JOqKPoQ?sub_confirmation=1)
116+
[**Subscribe us!**](https://www.youtube.com/channel/UCV_6HOhwxYLXAGd-JOqKPoQ?sub_confirmation=1)
2.26 MB
Loading

0 commit comments

Comments
 (0)