|
1 |
| -<!-- DISABLE-FRONTMATTER-SECTIONS --> |
2 |
| - |
3 |
| -# End-of-chapter quiz[[end-of-chapter-quiz]] |
| 1 | +# Summary[[summary]] |
4 | 2 |
|
5 | 3 | <CourseFloatingBanner
|
6 | 4 | chapter={1}
|
7 | 5 | classNames="absolute z-10 right-0 top-0"
|
8 | 6 | />
|
9 | 7 |
|
10 |
| -This chapter covered a lot of ground! Don't worry if you didn't grasp all the details; the next chapters will help you understand how things work under the hood. |
11 |
| - |
12 |
| -First, though, let's test what you learned in this chapter! |
13 |
| - |
14 |
| - |
15 |
| -### 1. Explore the Hub and look for the `roberta-large-mnli` checkpoint. What task does it perform? |
| 8 | +In this chapter, you've been introduced to the fundamentals of Transformer models, Large Language Models (LLMs), and how they're revolutionizing AI and beyond. |
16 | 9 |
|
| 10 | +## Key concepts covered |
17 | 11 |
|
18 |
| -<Question |
19 |
| - choices={[ |
20 |
| - { |
21 |
| - text: "Summarization", |
22 |
| - explain: "Look again on the <a href=\"https://huggingface.co/roberta-large-mnli\">roberta-large-mnli page</a>." |
23 |
| - }, |
24 |
| - { |
25 |
| - text: "Text classification", |
26 |
| - explain: "More precisely, it classifies if two sentences are logically linked across three labels (contradiction, neutral, entailment) — a task also called <em>natural language inference</em>.", |
27 |
| - correct: true |
28 |
| - }, |
29 |
| - { |
30 |
| - text: "Text generation", |
31 |
| - explain: "Look again on the <a href=\"https://huggingface.co/roberta-large-mnli\">roberta-large-mnli page</a>." |
32 |
| - } |
33 |
| - ]} |
34 |
| -/> |
35 |
| - |
36 |
| -### 2. What will the following code return? |
| 12 | +### Natural Language Processing and LLMs |
37 | 13 |
|
38 |
| -```py |
39 |
| -from transformers import pipeline |
| 14 | +We explored what NLP is and how Large Language Models have transformed the field. You learned that: |
| 15 | +- NLP encompasses a wide range of tasks from classification to generation |
| 16 | +- LLMs are powerful models trained on massive amounts of text data |
| 17 | +- These models can perform multiple tasks within a single architecture |
| 18 | +- Despite their capabilities, LLMs have limitations including hallucinations and bias |
40 | 19 |
|
41 |
| -ner = pipeline("ner", grouped_entities=True) |
42 |
| -ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") |
43 |
| -``` |
| 20 | +### Transformer capabilities |
44 | 21 |
|
45 |
| -<Question |
46 |
| - choices={[ |
47 |
| - { |
48 |
| - text: "It will return classification scores for this sentence, with labels \"positive\" or \"negative\".", |
49 |
| - explain: "This is incorrect — this would be a <code>sentiment-analysis</code> pipeline." |
50 |
| - }, |
51 |
| - { |
52 |
| - text: "It will return a generated text completing this sentence.", |
53 |
| - explain: "This is incorrect — it would be a <code>text-generation</code> pipeline.", |
54 |
| - }, |
55 |
| - { |
56 |
| - text: "It will return the words representing persons, organizations or locations.", |
57 |
| - explain: "Furthermore, with <code>grouped_entities=True</code>, it will group together the words belonging to the same entity, like \"Hugging Face\".", |
58 |
| - correct: true |
59 |
| - } |
60 |
| - ]} |
61 |
| -/> |
| 22 | +You saw how the `pipeline()` function from 🤗 Transformers makes it easy to use pre-trained models for various tasks: |
| 23 | +- Text classification, token classification, and question answering |
| 24 | +- Text generation and summarization |
| 25 | +- Translation and other sequence-to-sequence tasks |
| 26 | +- Speech recognition and image classification |
62 | 27 |
|
63 |
| -### 3. What should replace ... in this code sample? |
| 28 | +### Transformer architecture |
64 | 29 |
|
65 |
| -```py |
66 |
| -from transformers import pipeline |
| 30 | +We discussed how Transformer models work at a high level, including: |
| 31 | +- The importance of the attention mechanism |
| 32 | +- How transfer learning enables models to adapt to specific tasks |
| 33 | +- The three main architectural variants: encoder-only, decoder-only, and encoder-decoder |
67 | 34 |
|
68 |
| -filler = pipeline("fill-mask", model="bert-base-cased") |
69 |
| -result = filler("...") |
70 |
| -``` |
| 35 | +### Model architectures and their applications |
| 36 | +A key aspect of this chapter was understanding which architecture to use for different tasks: |
71 | 37 |
|
72 |
| -<Question |
73 |
| - choices={[ |
74 |
| - { |
75 |
| - text: "This <mask> has been waiting for you.", |
76 |
| - explain: "This is incorrect. Check out the <code>bert-base-cased</code> model card and try to spot your mistake." |
77 |
| - }, |
78 |
| - { |
79 |
| - text: "This [MASK] has been waiting for you.", |
80 |
| - explain: "Correct! This model's mask token is [MASK].", |
81 |
| - correct: true |
82 |
| - }, |
83 |
| - { |
84 |
| - text: "This man has been waiting for you.", |
85 |
| - explain: "This is incorrect. This pipeline fills in masked words, so it needs a mask token somewhere." |
86 |
| - } |
87 |
| - ]} |
88 |
| -/> |
| 38 | +| Model | Examples | Tasks | |
| 39 | +|-----------------|--------------------------------------------|----------------------------------------------------------------------------------| |
| 40 | +| Encoder-only | BERT, DistilBERT, ModernBERT | Sentence classification, named entity recognition, extractive question answering | |
| 41 | +| Decoder-only | GPT, LLaMA, Gemma, SmolLM | Text generation, conversational AI, creative writing | |
| 42 | +| Encoder-decoder | BART, T5, Marian, mBART | Summarization, translation, generative question answering | |
89 | 43 |
|
90 |
| -### 4. Why will this code fail? |
| 44 | +### Modern LLM developments |
| 45 | +You also learned about recent developments in the field: |
| 46 | +- How LLMs have grown in size and capability over time |
| 47 | +- The concept of scaling laws and how they guide model development |
| 48 | +- Specialized attention mechanisms that help models process longer sequences |
| 49 | +- The two-phase training approach of pretraining and instruction tuning |
91 | 50 |
|
92 |
| -```py |
93 |
| -from transformers import pipeline |
| 51 | +### Practical applications |
| 52 | +Throughout the chapter, you've seen how these models can be applied to real-world problems: |
| 53 | +- Using the Hugging Face Hub to find and use pre-trained models |
| 54 | +- Leveraging the Inference API to test models directly in your browser |
| 55 | +- Understanding which models are best suited for specific tasks |
94 | 56 |
|
95 |
| -classifier = pipeline("zero-shot-classification") |
96 |
| -result = classifier("This is a course about the Transformers library") |
97 |
| -``` |
| 57 | +## Looking ahead |
98 | 58 |
|
99 |
| -<Question |
100 |
| - choices={[ |
101 |
| - { |
102 |
| - text: "This pipeline requires that labels be given to classify this text.", |
103 |
| - explain: "Right — the correct code needs to include <code>candidate_labels=[...]</code>.", |
104 |
| - correct: true |
105 |
| - }, |
106 |
| - { |
107 |
| - text: "This pipeline requires several sentences, not just one.", |
108 |
| - explain: "This is incorrect, though when properly used, this pipeline can take a list of sentences to process (like all other pipelines)." |
109 |
| - }, |
110 |
| - { |
111 |
| - text: "The 🤗 Transformers library is broken, as usual.", |
112 |
| - explain: "We won't dignify this answer with a comment!" |
113 |
| - }, |
114 |
| - { |
115 |
| - text: "This pipeline requires longer inputs; this one is too short.", |
116 |
| - explain: "This is incorrect. Note that a very long text will be truncated when processed by this pipeline." |
117 |
| - } |
118 |
| - ]} |
119 |
| -/> |
120 |
| - |
121 |
| -### 5. What does "transfer learning" mean? |
122 |
| - |
123 |
| -<Question |
124 |
| - choices={[ |
125 |
| - { |
126 |
| - text: "Transferring the knowledge of a pretrained model to a new model by training it on the same dataset.", |
127 |
| - explain: "No, that would be two versions of the same model." |
128 |
| - }, |
129 |
| - { |
130 |
| - text: "Transferring the knowledge of a pretrained model to a new model by initializing the second model with the first model's weights.", |
131 |
| - explain: "Correct: when the second model is trained on a new task, it *transfers* the knowledge of the first model.", |
132 |
| - correct: true |
133 |
| - }, |
134 |
| - { |
135 |
| - text: "Transferring the knowledge of a pretrained model to a new model by building the second model with the same architecture as the first model.", |
136 |
| - explain: "The architecture is just the way the model is built; there is no knowledge shared or transferred in this case." |
137 |
| - } |
138 |
| - ]} |
139 |
| -/> |
140 |
| - |
141 |
| -### 6. True or false? A language model usually does not need labels for its pretraining. |
142 |
| - |
143 |
| -<Question |
144 |
| - choices={[ |
145 |
| - { |
146 |
| - text: "True", |
147 |
| - explain: "The pretraining is usually <em>self-supervised</em>, which means the labels are created automatically from the inputs (like predicting the next word or filling in some masked words).", |
148 |
| - correct: true |
149 |
| - }, |
150 |
| - { |
151 |
| - text: "False", |
152 |
| - explain: "This is not the correct answer." |
153 |
| - } |
154 |
| - ]} |
155 |
| -/> |
| 59 | +Now that you have a solid understanding of what Transformer models are and how they work at a high level, you're ready to dive deeper into how to use them effectively. In the next chapters, you'll learn how to: |
156 | 60 |
|
157 |
| -### 7. Select the sentence that best describes the terms "model", "architecture", and "weights". |
| 61 | +- Use the Transformers library to load and fine-tune models |
| 62 | +- Process different types of data for model input |
| 63 | +- Adapt pre-trained models to your specific tasks |
| 64 | +- Deploy models for practical applications |
158 | 65 |
|
159 |
| -<Question |
160 |
| - choices={[ |
161 |
| - { |
162 |
| - text: "If a model is a building, its architecture is the blueprint and the weights are the people living inside.", |
163 |
| - explain: "Following this metaphor, the weights would be the bricks and other materials used to construct the building." |
164 |
| - }, |
165 |
| - { |
166 |
| - text: "An architecture is a map to build a model and its weights are the cities represented on the map.", |
167 |
| - explain: "The problem with this metaphor is that a map usually represents one existing reality (there is only one city in France named Paris). For a given architecture, multiple weights are possible." |
168 |
| - }, |
169 |
| - { |
170 |
| - text: "An architecture is a succession of mathematical functions to build a model and its weights are those functions parameters.", |
171 |
| - explain: "The same set of mathematical functions (architecture) can be used to build different models by using different parameters (weights).", |
172 |
| - correct: true |
173 |
| - } |
174 |
| - ]} |
175 |
| -/> |
176 |
| - |
177 |
| - |
178 |
| -### 8. Which of these types of models would you use for completing prompts with generated text? |
179 |
| - |
180 |
| -<Question |
181 |
| - choices={[ |
182 |
| - { |
183 |
| - text: "An encoder model", |
184 |
| - explain: "An encoder model generates a representation of the whole sentence that is better suited for tasks like classification." |
185 |
| - }, |
186 |
| - { |
187 |
| - text: "A decoder model", |
188 |
| - explain: "Decoder models are perfectly suited for text generation from a prompt.", |
189 |
| - correct: true |
190 |
| - }, |
191 |
| - { |
192 |
| - text: "A sequence-to-sequence model", |
193 |
| - explain: "Sequence-to-sequence models are better suited for tasks where you want to generate sentences in relation to the input sentences, not a given prompt." |
194 |
| - } |
195 |
| - ]} |
196 |
| -/> |
197 |
| - |
198 |
| -### 9. Which of those types of models would you use for summarizing texts? |
199 |
| - |
200 |
| -<Question |
201 |
| - choices={[ |
202 |
| - { |
203 |
| - text: "An encoder model", |
204 |
| - explain: "An encoder model generates a representation of the whole sentence that is better suited for tasks like classification." |
205 |
| - }, |
206 |
| - { |
207 |
| - text: "A decoder model", |
208 |
| - explain: "Decoder models are good for generating output text (like summaries), but they don't have the ability to exploit a context like the whole text to summarize." |
209 |
| - }, |
210 |
| - { |
211 |
| - text: "A sequence-to-sequence model", |
212 |
| - explain: "Sequence-to-sequence models are perfectly suited for a summarization task.", |
213 |
| - correct: true |
214 |
| - } |
215 |
| - ]} |
216 |
| -/> |
217 |
| - |
218 |
| -### 10. Which of these types of models would you use for classifying text inputs according to certain labels? |
219 |
| - |
220 |
| -<Question |
221 |
| - choices={[ |
222 |
| - { |
223 |
| - text: "An encoder model", |
224 |
| - explain: "An encoder model generates a representation of the whole sentence which is perfectly suited for a task like classification.", |
225 |
| - correct: true |
226 |
| - }, |
227 |
| - { |
228 |
| - text: "A decoder model", |
229 |
| - explain: "Decoder models are good for generating output texts, not extracting a label out of a sentence." |
230 |
| - }, |
231 |
| - { |
232 |
| - text: "A sequence-to-sequence model", |
233 |
| - explain: "Sequence-to-sequence models are better suited for tasks where you want to generate text based on an input sentence, not a label.", |
234 |
| - } |
235 |
| - ]} |
236 |
| -/> |
237 |
| - |
238 |
| -### 11. What possible source can the bias observed in a model have? |
239 |
| - |
240 |
| -<Question |
241 |
| - choices={[ |
242 |
| - { |
243 |
| - text: "The model is a fine-tuned version of a pretrained model and it picked up its bias from it.", |
244 |
| - explain: "When applying Transfer Learning, the bias in the pretrained model used persists in the fine-tuned model.", |
245 |
| - correct: true |
246 |
| - }, |
247 |
| - { |
248 |
| - text: "The data the model was trained on is biased.", |
249 |
| - explain: "This is the most obvious source of bias, but not the only one.", |
250 |
| - correct: true |
251 |
| - }, |
252 |
| - { |
253 |
| - text: "The metric the model was optimizing for is biased.", |
254 |
| - explain: "A less obvious source of bias is the way the model is trained. Your model will blindly optimize for whatever metric you chose, without any second thoughts.", |
255 |
| - correct: true |
256 |
| - } |
257 |
| - ]} |
258 |
| -/> |
| 66 | +The foundation you've built in this chapter will serve you well as you explore more advanced topics and techniques in the coming sections. |
0 commit comments