You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 19-slm/README.md
+21-22Lines changed: 21 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,10 +32,10 @@ In this lesson, we hope to introduce the knowledge of SLM and combine it with Mi
32
32
33
33
By the end of this lesson, you should be able to answer the following questions:
34
34
35
-
- What is SLM
36
-
- What is the difference about SLM and LLM
37
-
- What is Microsoft Phi-3/3.5 Family
38
-
- How to inference Microsoft Phi-3/3.5 Family
35
+
- What is SLM?
36
+
- What is the difference between SLM and LLM?
37
+
- What is the Microsoft Phi-3/3.5 Family?
38
+
- How to run inference with the Microsoft Phi-3/3.5 Family?
39
39
40
40
Ready? Let's get started.
41
41
@@ -74,7 +74,7 @@ The reduced size of SLMs affords them a significant advantage in terms of infere
74
74
75
75
In summary, while both LLMs and SLMs share a foundational basis in machine learning, they differ significantly in terms of model size, resource requirements, contextual understanding, susceptibility to bias, and inference speed. These distinctions reflect their respective suitability for different use cases, with LLMs being more versatile but resource-heavy, and SLMs offering more domain-specific efficiency with reduced computational demands.
76
76
77
-
***Note:In this chapter, we will introduce SLM using Microsoft Phi-3 / 3.5 as an example.***
77
+
***Note: In this lesson, we will introduce SLM using Microsoft Phi-3 / 3.5 as an example.***
78
78
79
79
## Introduce Phi-3 / Phi-3.5 Family
80
80
@@ -86,7 +86,7 @@ Mainly for text generation, chat completion, and content information extraction,
86
86
87
87
**Phi-3-mini**
88
88
89
-
The 3.8B language model is available on Microsoft Azure AI Studio, Hugging Face, and Ollama. Phi-3 models significantly outperform language models of equal and larger sizes on key benchmarks (see benchmark numbers below, higher numbers are better). Phi-3-mini outperforms models twice its size, while Phi-3-small and Phi-3-medium outperform larger models, including GPT-3.5
89
+
The 3.8B language model is available on Microsoft Azure AI Studio, Hugging Face, and Ollama. Phi-3 models significantly outperform language models of equal and larger sizes on key benchmarks (see benchmark numbers below, higher numbers are better). Phi-3-mini outperforms models twice its size, while Phi-3-small and Phi-3-medium outperform larger models, including GPT-3.5.
90
90
91
91
**Phi-3-small & medium**
92
92
@@ -96,8 +96,7 @@ The Phi-3-medium with 14B parameters continues this trend and outperforms the Ge
96
96
97
97
**Phi-3.5-mini**
98
98
99
-
We can think of it as an upgrade of Phi-3-mini. While the parameters remain unchanged, it improves the ability to support multiple languages(
100
-
Support 20+ languages:Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian) and adds stronger support for long context.
99
+
We can think of it as an upgrade of Phi-3-mini. While the parameters remain unchanged, it improves the ability to support multiple languages (support 20+ languages: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian) and adds stronger support for long context.
101
100
102
101
Phi-3.5-mini with 3.8B parameters outperforms language models of the same size and is on par with models twice its size.
103
102
@@ -115,14 +114,14 @@ Phi-3-vision, with only 4.2B parameters, continues this trend and outperforms la
115
114
116
115
Phi-3.5-Vision is also an upgrade of Phi-3-Vision, adding support for multiple images. You can think of it as an improvement in vision, not only can you see pictures, but also videos.
117
116
118
-
Phi-3.5-vision outperforms larger models such as Claude-3.5 Sonnet and Gemini 1.5 Flash across OCR, table and chart understanding tasks and on par on general visual knowledge reasoning tasks.Support multi-frame input, i.e., perform reasoning on multiple input images
117
+
Phi-3.5-vision outperforms larger models such as Claude-3.5 Sonnet and Gemini 1.5 Flash across OCR, table and chart understanding tasks and on par on general visual knowledge reasoning tasks.Support multi-frame input, i.e., perform reasoning on multiple input images
119
118
120
119
121
120
### Phi-3.5-MoE
122
121
123
122
***Mixture of Experts(MoE)*** enables models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.
124
123
125
-
Phi-3.5-MoE comprises 16x3.8B expert modules.Phi-3.5-MoE with only 6.6B active parameters achieves a similar level of reasoning, language understanding, and math as much larger models
124
+
Phi-3.5-MoE comprises 16x3.8B expert modules.Phi-3.5-MoE with only 6.6B active parameters achieves a similar level of reasoning, language understanding, and math as much larger models
126
125
127
126
We can use the Phi-3/3.5 Family model based on different scenarios. Unlike LLM, you can deploy Phi-3/3.5-mini or Phi-3/3.5-Vision on edge devices.
128
127
@@ -133,13 +132,13 @@ We hope to use Phi-3/3.5 in different scenarios. Next, we will use Phi-3/3.5 bas
GitHub Models is the most direct way. You can quickly access the Phi-3/3.5-Instruct model through GitHub Models. Combined with the Azure AI Inference SDK / OpenAI SDK, you can access the API through code to complete the Phi-3/3.5-Instruct call. You can also test different effects through Playground.
141
140
142
-
- Demo:Comparison of the effects of Phi-3-mini and Phi-3.5-mini in Chinese scenarios
141
+
- Demo:Comparison of the effects of Phi-3-mini and Phi-3.5-mini in Chinese scenarios
@@ -153,7 +152,7 @@ Or if we want to use the vision and MoE models, you can use Azure AI Studio to c
153
152
154
153
**NVIDIA NIM**
155
154
156
-
In addition to the cloud-based Model Catalog solutions provided by Azure and GitHub, you can also use [Nivida NIM](https://developer.nvidia.com/nim?WT.mc_id=academic-105485-koreyst) to complete related calls. You can visit NIVIDA NIM to complete the API calls of the Phi-3/3.5 Family. NVIDIA NIM (NVIDIA Inference Microservices) is a set of accelerated inference microservices designed to help developers deploy AI models efficiently across various environments, including clouds, data centers, and workstations.
155
+
In addition to the cloud-based Model Catalog solutions provided by Azure and GitHub, you can also use [NVIDIA NIM](https://developer.nvidia.com/nim?WT.mc_id=academic-105485-koreyst) to complete related calls. You can visit NVIDIA NIM to complete the API calls of the Phi-3/3.5 Family. NVIDIA NIM (NVIDIA Inference Microservices) is a set of accelerated inference microservices designed to help developers deploy AI models efficiently across various environments, including clouds, data centers, and workstations.
157
156
158
157
Here are some key features of NVIDIA NIM:
159
158
@@ -165,10 +164,10 @@ Here are some key features of NVIDIA NIM:
165
164
166
165
NIM is part of NVIDIA AI Enterprise, which aims to simplify the deployment and operationalization of AI models, ensuring they run efficiently on NVIDIA GPUs.
167
166
168
-
- Demo: Using Nividia NIM to call Phi-3.5-Vision-API [[Click this link](./python/Phi-3-Vision-Nividia-NIM.ipynb?WT.mc_id=academic-105485-koreyst)]
167
+
- Demo: Using NVIDIA NIM to call Phi-3.5-Vision-API [[Click this link](./python/Phi-3-Vision-Nividia-NIM.ipynb?WT.mc_id=academic-105485-koreyst)]
169
168
170
169
171
-
### Inference Phi-3/3.5 in local env
170
+
### Running Phi-3/3.5 Locally
172
171
Inference in relation to Phi-3, or any language model like GPT-3, refers to the process of generating responses or predictions based on the input it receives. When you provide a prompt or question to Phi-3, it uses its trained neural network to infer the most likely and relevant response by analyzing patterns and relationships in the data it was trained on.
173
172
174
173
**Hugging Face Transformer**
@@ -185,17 +184,17 @@ Hugging Face Transformers is a powerful library designed for natural language pr
185
184
5.**Community and Resources**: Hugging Face has a vibrant community and extensive documentation, tutorials, and guides to help users get started and make the most of the library.
186
185
[official documentation](https://huggingface.co/docs/transformers/index?WT.mc_id=academic-105485-koreyst) or their [GitHub repository](https://github.com/huggingface/transformers?WT.mc_id=academic-105485-koreyst).
187
186
188
-
This is the most commonly used method, but it also requires GPU acceleration. After all, scenes such as Vision and MoE require a lot of calculations, which will be very limited in the CPU if they are not quantized.
187
+
This is the most commonly used method, but it also requires GPU acceleration. After all, scenarios such as Vision and MoE require a lot of calculations, which will be very slow on CPU if they are not quantized.
189
188
190
189
191
-
- Demo:Using Transformer to call Phi-3.5-Instuct[Click this link](./python/phi35-instruct-demo.ipynb?WT.mc_id=academic-105485-koreyst)
190
+
- Demo:Using Transformer to call Phi-3.5-Instruct[Click this link](./python/phi35-instruct-demo.ipynb?WT.mc_id=academic-105485-koreyst)
192
191
193
-
- Demo:Using Transformer to call Phi-3.5-Vision[Click this link](./python/phi35-vision-demo.ipynb?WT.mc_id=academic-105485-koreyst)
192
+
- Demo:Using Transformer to call Phi-3.5-Vision[Click this link](./python/phi35-vision-demo.ipynb?WT.mc_id=academic-105485-koreyst)
194
193
195
-
- Demo:Using Transformer to call Phi-3.5-MoE[Click this link](./python/phi35_moe_demo.ipynb?WT.mc_id=academic-105485-koreyst)
194
+
- Demo:Using Transformer to call Phi-3.5-MoE[Click this link](./python/phi35_moe_demo.ipynb?WT.mc_id=academic-105485-koreyst)
196
195
197
196
**Ollama**
198
-
[Ollama](https://ollama.com/?WT.mc_id=academic-105485-koreyst) is a platform designed to make it easier to run large language models (LLMs) locally on your machine. It supports various models like Llama 3.1, Phi 3, Mistral, and Gemma 2, among others. The platform simplifies the process by bundling model weights, configuration, and data into a single package, making it more accessible for users to customize and create their own models. Ollama is available for macOS, Linux, and Windows. It’s a great tool if you’re looking to experiment with or deploy LLMs without relying on cloud services. Ollama is the most direct way, you just need to execute the following statement.
197
+
[Ollama](https://ollama.com/?WT.mc_id=academic-105485-koreyst) is a platform designed to make it easier to run large language models (LLMs) locally on your machine. It supports various models like Llama 3.1, Phi 3, Mistral, and Gemma 2, among others. The platform simplifies the process by bundling model weights, configuration, and data into a single package, making it more accessible for users to customize and create their own models. Ollama is available for macOS, Linux, and Windows. It’s a great tool if you’re looking to experiment with or deploy LLMs without relying on cloud services. Ollama is the most direct way, you just need to execute the following command.
199
198
200
199
201
200
```bash
@@ -210,7 +209,7 @@ ollama run phi3.5
210
209
[ONNX Runtime](https://github.com/microsoft/onnxruntime-genai?WT.mc_id=academic-105485-koreyst) is a cross-platform inference and training machine-learning accelerator. ONNX Runtime for Generative AI (GENAI) is a powerful tool that helps you run generative AI models efficiently across various platforms.
211
210
212
211
## What is ONNX Runtime?
213
-
ONNX Runtime is an open-source project that enables high-performance inference of machine learning models. It supports models in the Open Neural Network Exchange (ONNX) format, which is a standard for representing machine learning models.ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms
212
+
ONNX Runtime is an open-source project that enables high-performance inference of machine learning models. It supports models in the Open Neural Network Exchange (ONNX) format, which is a standard for representing machine learning models.ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms.
214
213
215
214
## What is Generative AI?
216
215
Generative AI refers to AI systems that can generate new content, such as text, images, or music, based on the data they have been trained on. Examples include language models like GPT-3 and image generation models like Stable Diffusion. ONNX Runtime for GenAI library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
@@ -302,7 +301,7 @@ while not generator.is_done():
Copy file name to clipboardExpand all lines: 20-mistral/README.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,14 +5,14 @@
5
5
This lesson will cover:
6
6
- Exploring the different Mistral Models
7
7
- Understanding the use-cases and scenarios for each model
8
-
-Code samples show the unique features of each model.
8
+
-Exploring code samples that show the unique features of each model.
9
9
10
10
## The Mistral Models
11
11
12
12
In this lesson, we will explore 3 different Mistral models:
13
13
**Mistral Large**, **Mistral Small** and **Mistral Nemo**.
14
14
15
-
Each of these models is available free on the Github Model marketplace. The code in this notebook will be using these models to run the code. Here are more details on using Github Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst).
15
+
Each of these models is available free on the GitHub Model marketplace. The code in this notebook will be using these models to run the code. Here are more details on using GitHub Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst).
16
16
17
17
18
18
## Mistral Large 2 (2407)
@@ -92,7 +92,7 @@ d = text_embeddings.shape[1]
92
92
index = faiss.IndexFlatL2(d)
93
93
index.add(text_embeddings)
94
94
95
-
question ="저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요??"
95
+
question ="저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요?"
96
96
97
97
question_embedding = embed_client.embed(
98
98
input=[question],
@@ -214,7 +214,7 @@ It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral
214
214
215
215
Some other features of the NeMo model are:
216
216
217
-
-*More efficient tokenization:* This model using the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code.
217
+
-*More efficient tokenization:* This model uses the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code.
218
218
219
219
-*Finetuning:* The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed.
220
220
@@ -225,7 +225,7 @@ Some other features of the NeMo model are:
225
225
226
226
In this sample, we will look at how Mistral NeMo handles tokenization compared to Mistral Large.
227
227
228
-
Both samples take the same prompt but you should see that NeMo returns back less tokens vs Mistral Large.
228
+
Both samples take the same prompt but you should see that NeMo returns fewer tokens than Mistral Large.
229
229
230
230
```bash
231
231
pip install mistral-common
@@ -245,7 +245,7 @@ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
"description": "The temperature unit to use. Infer this from the users location.",
326
+
"description": "The temperature unit to use. Infer this from the user's location.",
327
327
},
328
328
},
329
329
"required": ["location", "format"],
@@ -343,6 +343,6 @@ tokens, text = tokenized.tokens, tokenized.text
343
343
print(len(tokens))
344
344
```
345
345
346
-
## Learning does not stop here, continue the Journey
346
+
## Learning does not stop here, continue the journey
347
347
348
348
After completing this lesson, check out our [Generative AI Learning collection](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) to continue leveling up your Generative AI knowledge!
0 commit comments