Skip to content

Commit 89018fe

Browse files
authored
Merge pull request #1113 from VikrantSingh01/fix/docs-lessons-19-21-typos-grammar
docs: fix typos, grammar, and code bugs in Lessons 19-21
2 parents 6778476 + 1c55a8c commit 89018fe

3 files changed

Lines changed: 37 additions & 38 deletions

File tree

19-slm/README.md

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,10 @@ In this lesson, we hope to introduce the knowledge of SLM and combine it with Mi
3232

3333
By the end of this lesson, you should be able to answer the following questions:
3434

35-
- What is SLM
36-
- What is the difference about SLM and LLM
37-
- What is Microsoft Phi-3/3.5 Family
38-
- How to inference Microsoft Phi-3/3.5 Family
35+
- What is SLM?
36+
- What is the difference between SLM and LLM?
37+
- What is the Microsoft Phi-3/3.5 Family?
38+
- How to run inference with the Microsoft Phi-3/3.5 Family?
3939

4040
Ready? Let's get started.
4141

@@ -74,7 +74,7 @@ The reduced size of SLMs affords them a significant advantage in terms of infere
7474

7575
In summary, while both LLMs and SLMs share a foundational basis in machine learning, they differ significantly in terms of model size, resource requirements, contextual understanding, susceptibility to bias, and inference speed. These distinctions reflect their respective suitability for different use cases, with LLMs being more versatile but resource-heavy, and SLMs offering more domain-specific efficiency with reduced computational demands.
7676

77-
***NoteIn this chapter, we will introduce SLM using Microsoft Phi-3 / 3.5 as an example.***
77+
***Note: In this lesson, we will introduce SLM using Microsoft Phi-3 / 3.5 as an example.***
7878

7979
## Introduce Phi-3 / Phi-3.5 Family
8080

@@ -86,7 +86,7 @@ Mainly for text generation, chat completion, and content information extraction,
8686

8787
**Phi-3-mini**
8888

89-
The 3.8B language model is available on Microsoft Azure AI Studio, Hugging Face, and Ollama. Phi-3 models significantly outperform language models of equal and larger sizes on key benchmarks (see benchmark numbers below, higher numbers are better). Phi-3-mini outperforms models twice its size, while Phi-3-small and Phi-3-medium outperform larger models, including GPT-3.5
89+
The 3.8B language model is available on Microsoft Azure AI Studio, Hugging Face, and Ollama. Phi-3 models significantly outperform language models of equal and larger sizes on key benchmarks (see benchmark numbers below, higher numbers are better). Phi-3-mini outperforms models twice its size, while Phi-3-small and Phi-3-medium outperform larger models, including GPT-3.5.
9090

9191
**Phi-3-small & medium**
9292

@@ -96,8 +96,7 @@ The Phi-3-medium with 14B parameters continues this trend and outperforms the Ge
9696

9797
**Phi-3.5-mini**
9898

99-
We can think of it as an upgrade of Phi-3-mini. While the parameters remain unchanged, it improves the ability to support multiple languages(
100-
Support 20+ languages:Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian) ​​and adds stronger support for long context.
99+
We can think of it as an upgrade of Phi-3-mini. While the parameters remain unchanged, it improves the ability to support multiple languages (support 20+ languages: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian) ​​and adds stronger support for long context.
101100

102101
Phi-3.5-mini with 3.8B parameters outperforms language models of the same size and is on par with models twice its size.
103102

@@ -115,14 +114,14 @@ Phi-3-vision, with only 4.2B parameters, continues this trend and outperforms la
115114

116115
Phi-3.5-Vision is also an upgrade of Phi-3-Vision, adding support for multiple images. You can think of it as an improvement in vision, not only can you see pictures, but also videos.
117116

118-
Phi-3.5-vision outperforms larger models such as Claude-3.5 Sonnet and Gemini 1.5 Flash across OCR, table and chart understanding tasks and on par on general visual knowledge reasoning tasks.Support multi-frame input, i.e., perform reasoning on multiple input images
117+
Phi-3.5-vision outperforms larger models such as Claude-3.5 Sonnet and Gemini 1.5 Flash across OCR, table and chart understanding tasks and on par on general visual knowledge reasoning tasks. Support multi-frame input, i.e., perform reasoning on multiple input images
119118

120119

121120
### Phi-3.5-MoE
122121

123122
***Mixture of Experts(MoE)*** enables models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.
124123

125-
Phi-3.5-MoE comprises 16x3.8B expert modules.Phi-3.5-MoE with only 6.6B active parameters achieves a similar level of reasoning, language understanding, and math as much larger models
124+
Phi-3.5-MoE comprises 16x3.8B expert modules. Phi-3.5-MoE with only 6.6B active parameters achieves a similar level of reasoning, language understanding, and math as much larger models
126125

127126
We can use the Phi-3/3.5 Family model based on different scenarios. Unlike LLM, you can deploy Phi-3/3.5-mini or Phi-3/3.5-Vision on edge devices.
128127

@@ -133,13 +132,13 @@ We hope to use Phi-3/3.5 in different scenarios. Next, we will use Phi-3/3.5 bas
133132

134133
![phi3](./img/phi3.png?WT.mc_id=academic-105485-koreyst)
135134

136-
### Inference difference Cloud's API
135+
### Inference via Cloud APIs
137136

138137
**GitHub Models**
139138

140139
GitHub Models is the most direct way. You can quickly access the Phi-3/3.5-Instruct model through GitHub Models. Combined with the Azure AI Inference SDK / OpenAI SDK, you can access the API through code to complete the Phi-3/3.5-Instruct call. You can also test different effects through Playground.
141140

142-
- Demo:Comparison of the effects of Phi-3-mini and Phi-3.5-mini in Chinese scenarios
141+
- Demo: Comparison of the effects of Phi-3-mini and Phi-3.5-mini in Chinese scenarios
143142

144143
![phi3](./img/gh1.png?WT.mc_id=academic-105485-koreyst)
145144

@@ -153,7 +152,7 @@ Or if we want to use the vision and MoE models, you can use Azure AI Studio to c
153152

154153
**NVIDIA NIM**
155154

156-
In addition to the cloud-based Model Catalog solutions provided by Azure and GitHub, you can also use [Nivida NIM](https://developer.nvidia.com/nim?WT.mc_id=academic-105485-koreyst) to complete related calls. You can visit NIVIDA NIM to complete the API calls of the Phi-3/3.5 Family. NVIDIA NIM (NVIDIA Inference Microservices) is a set of accelerated inference microservices designed to help developers deploy AI models efficiently across various environments, including clouds, data centers, and workstations.
155+
In addition to the cloud-based Model Catalog solutions provided by Azure and GitHub, you can also use [NVIDIA NIM](https://developer.nvidia.com/nim?WT.mc_id=academic-105485-koreyst) to complete related calls. You can visit NVIDIA NIM to complete the API calls of the Phi-3/3.5 Family. NVIDIA NIM (NVIDIA Inference Microservices) is a set of accelerated inference microservices designed to help developers deploy AI models efficiently across various environments, including clouds, data centers, and workstations.
157156

158157
Here are some key features of NVIDIA NIM:
159158

@@ -165,10 +164,10 @@ Here are some key features of NVIDIA NIM:
165164

166165
NIM is part of NVIDIA AI Enterprise, which aims to simplify the deployment and operationalization of AI models, ensuring they run efficiently on NVIDIA GPUs.
167166

168-
- Demo: Using Nividia NIM to call Phi-3.5-Vision-API [[Click this link](./python/Phi-3-Vision-Nividia-NIM.ipynb?WT.mc_id=academic-105485-koreyst)]
167+
- Demo: Using NVIDIA NIM to call Phi-3.5-Vision-API [[Click this link](./python/Phi-3-Vision-Nividia-NIM.ipynb?WT.mc_id=academic-105485-koreyst)]
169168

170169

171-
### Inference Phi-3/3.5 in local env
170+
### Running Phi-3/3.5 Locally
172171
Inference in relation to Phi-3, or any language model like GPT-3, refers to the process of generating responses or predictions based on the input it receives. When you provide a prompt or question to Phi-3, it uses its trained neural network to infer the most likely and relevant response by analyzing patterns and relationships in the data it was trained on.
173172

174173
**Hugging Face Transformer**
@@ -185,17 +184,17 @@ Hugging Face Transformers is a powerful library designed for natural language pr
185184
5. **Community and Resources**: Hugging Face has a vibrant community and extensive documentation, tutorials, and guides to help users get started and make the most of the library.
186185
[official documentation](https://huggingface.co/docs/transformers/index?WT.mc_id=academic-105485-koreyst) or their [GitHub repository](https://github.com/huggingface/transformers?WT.mc_id=academic-105485-koreyst).
187186

188-
This is the most commonly used method, but it also requires GPU acceleration. After all, scenes such as Vision and MoE require a lot of calculations, which will be very limited in the CPU if they are not quantized.
187+
This is the most commonly used method, but it also requires GPU acceleration. After all, scenarios such as Vision and MoE require a lot of calculations, which will be very slow on CPU if they are not quantized.
189188

190189

191-
- Demo:Using Transformer to call Phi-3.5-Instuct [Click this link](./python/phi35-instruct-demo.ipynb?WT.mc_id=academic-105485-koreyst)
190+
- Demo: Using Transformer to call Phi-3.5-Instruct [Click this link](./python/phi35-instruct-demo.ipynb?WT.mc_id=academic-105485-koreyst)
192191

193-
- Demo:Using Transformer to call Phi-3.5-Vision[Click this link](./python/phi35-vision-demo.ipynb?WT.mc_id=academic-105485-koreyst)
192+
- Demo: Using Transformer to call Phi-3.5-Vision [Click this link](./python/phi35-vision-demo.ipynb?WT.mc_id=academic-105485-koreyst)
194193

195-
- Demo:Using Transformer to call Phi-3.5-MoE[Click this link](./python/phi35_moe_demo.ipynb?WT.mc_id=academic-105485-koreyst)
194+
- Demo: Using Transformer to call Phi-3.5-MoE [Click this link](./python/phi35_moe_demo.ipynb?WT.mc_id=academic-105485-koreyst)
196195

197196
**Ollama**
198-
[Ollama](https://ollama.com/?WT.mc_id=academic-105485-koreyst) is a platform designed to make it easier to run large language models (LLMs) locally on your machine. It supports various models like Llama 3.1, Phi 3, Mistral, and Gemma 2, among others. The platform simplifies the process by bundling model weights, configuration, and data into a single package, making it more accessible for users to customize and create their own models. Ollama is available for macOS, Linux, and Windows. It’s a great tool if you’re looking to experiment with or deploy LLMs without relying on cloud services. Ollama is the most direct way, you just need to execute the following statement.
197+
[Ollama](https://ollama.com/?WT.mc_id=academic-105485-koreyst) is a platform designed to make it easier to run large language models (LLMs) locally on your machine. It supports various models like Llama 3.1, Phi 3, Mistral, and Gemma 2, among others. The platform simplifies the process by bundling model weights, configuration, and data into a single package, making it more accessible for users to customize and create their own models. Ollama is available for macOS, Linux, and Windows. It’s a great tool if you’re looking to experiment with or deploy LLMs without relying on cloud services. Ollama is the most direct way, you just need to execute the following command.
199198

200199

201200
```bash
@@ -210,7 +209,7 @@ ollama run phi3.5
210209
[ONNX Runtime](https://github.com/microsoft/onnxruntime-genai?WT.mc_id=academic-105485-koreyst) is a cross-platform inference and training machine-learning accelerator. ONNX Runtime for Generative AI (GENAI) is a powerful tool that helps you run generative AI models efficiently across various platforms.
211210

212211
## What is ONNX Runtime?
213-
ONNX Runtime is an open-source project that enables high-performance inference of machine learning models. It supports models in the Open Neural Network Exchange (ONNX) format, which is a standard for representing machine learning models.ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms
212+
ONNX Runtime is an open-source project that enables high-performance inference of machine learning models. It supports models in the Open Neural Network Exchange (ONNX) format, which is a standard for representing machine learning models.ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms.
214213

215214
## What is Generative AI?
216215
Generative AI refers to AI systems that can generate new content, such as text, images, or music, based on the data they have been trained on. Examples include language models like GPT-3 and image generation models like Stable Diffusion. ONNX Runtime for GenAI library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
@@ -302,7 +301,7 @@ while not generator.is_done():
302301

303302
new_token = generator.get_next_tokens()[0]
304303

305-
code += tokenizer_stream.decode(new_token)
304+
output = tokenizer_stream.decode(new_token)
306305

307306
print(tokenizer_stream.decode(new_token), end='', flush=True)
308307

20-mistral/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@
55
This lesson will cover:
66
- Exploring the different Mistral Models
77
- Understanding the use-cases and scenarios for each model
8-
- Code samples show the unique features of each model.
8+
- Exploring code samples that show the unique features of each model.
99

1010
## The Mistral Models
1111

1212
In this lesson, we will explore 3 different Mistral models:
1313
**Mistral Large**, **Mistral Small** and **Mistral Nemo**.
1414

15-
Each of these models is available free on the Github Model marketplace. The code in this notebook will be using these models to run the code. Here are more details on using Github Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst).
15+
Each of these models is available free on the GitHub Model marketplace. The code in this notebook will be using these models to run the code. Here are more details on using GitHub Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst).
1616

1717

1818
## Mistral Large 2 (2407)
@@ -92,7 +92,7 @@ d = text_embeddings.shape[1]
9292
index = faiss.IndexFlatL2(d)
9393
index.add(text_embeddings)
9494

95-
question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요?"
95+
question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요?"
9696

9797
question_embedding = embed_client.embed(
9898
input=[question],
@@ -214,7 +214,7 @@ It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral
214214

215215
Some other features of the NeMo model are:
216216

217-
- *More efficient tokenization:* This model using the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code.
217+
- *More efficient tokenization:* This model uses the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code.
218218

219219
- *Finetuning:* The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed.
220220

@@ -225,7 +225,7 @@ Some other features of the NeMo model are:
225225

226226
In this sample, we will look at how Mistral NeMo handles tokenization compared to Mistral Large.
227227

228-
Both samples take the same prompt but you should see that NeMo returns back less tokens vs Mistral Large.
228+
Both samples take the same prompt but you should see that NeMo returns fewer tokens than Mistral Large.
229229

230230
```bash
231231
pip install mistral-common
@@ -245,7 +245,7 @@ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
245245

246246
# Load Mistral tokenizer
247247

248-
model_name = "open-mistral-nemo "
248+
model_name = "open-mistral-nemo"
249249

250250
tokenizer = MistralTokenizer.from_model(model_name)
251251

@@ -267,7 +267,7 @@ tokenized = tokenizer.encode_chat_completion(
267267
"format": {
268268
"type": "string",
269269
"enum": ["celsius", "fahrenheit"],
270-
"description": "The temperature unit to use. Infer this from the users location.",
270+
"description": "The temperature unit to use. Infer this from the user's location.",
271271
},
272272
},
273273
"required": ["location", "format"],
@@ -323,7 +323,7 @@ tokenized = tokenizer.encode_chat_completion(
323323
"format": {
324324
"type": "string",
325325
"enum": ["celsius", "fahrenheit"],
326-
"description": "The temperature unit to use. Infer this from the users location.",
326+
"description": "The temperature unit to use. Infer this from the user's location.",
327327
},
328328
},
329329
"required": ["location", "format"],
@@ -343,6 +343,6 @@ tokens, text = tokenized.tokens, tokenized.text
343343
print(len(tokens))
344344
```
345345

346-
## Learning does not stop here, continue the Journey
346+
## Learning does not stop here, continue the journey
347347

348348
After completing this lesson, check out our [Generative AI Learning collection](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) to continue leveling up your Generative AI knowledge!

0 commit comments

Comments
 (0)