Skip to content

Improve and re-release chapter 2 #911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions chapters/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@
- local: chapter2/7
title: Basic usage completed!
- local: chapter2/8
title: Inference with transformers and beyond
- local: chapter2/9
title: End-of-chapter quiz
quiz: 2

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter2/1.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ As you saw in [Chapter 1](/course/chapter1), Transformer models are usually very
The 🤗 Transformers library was created to solve this problem. Its goal is to provide a single API through which any Transformer model can be loaded, trained, and saved. The library's main features are:

- **Ease of use**: Downloading, loading, and using a state-of-the-art NLP model for inference can be done in just two lines of code.
- **Flexibility**: At their core, all models are simple PyTorch `nn.Module` or TensorFlow `tf.keras.Model` classes and can be handled like any other models in their respective machine learning (ML) frameworks.
- **Flexibility**: At their core, all models are simple PyTorch `nn.Module` classes and can be handled like any other models in their respective machine learning (ML) frameworks.
- **Simplicity**: Hardly any abstractions are made across the library. The "All in one file" is a core concept: a model's forward pass is entirely defined in a single file, so that the code itself is understandable and hackable.

This last feature makes 🤗 Transformers quite different from other ML libraries. The models are not built on modules
Expand Down
122 changes: 2 additions & 120 deletions chapters/en/chapter2/2.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,14 @@

# Behind the pipeline[[behind-the-pipeline]]

{#if fw === 'pt'}

<CourseFloatingBanner chapter={2}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_pt.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_pt.ipynb"},
]} />

{:else}

<CourseFloatingBanner chapter={2}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_tf.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_tf.ipynb"},
]} />

{/if}

<Tip>
This is the first section where the content is slightly different depending on whether you use PyTorch or TensorFlow. Toggle the switch on top of the title to select the platform you prefer!
</Tip>

{#if fw === 'pt'}
<Youtube id="1pedAIvTWXk"/>
{:else}
<Youtube id="wVN12smEvqg"/>
{/if}

Let's start with a complete example, taking a look at what happened behind the scenes when we executed the following code in [Chapter 1](/course/chapter1):

Expand Down Expand Up @@ -83,11 +62,10 @@ tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Once we have the tokenizer, we can directly pass our sentences to it and we'll get back a dictionary that's ready to feed to our model! The only thing left to do is to convert the list of input IDs to tensors.

You can use 🤗 Transformers without having to worry about which ML framework is used as a backend; it might be PyTorch or TensorFlow, or Flax for some models. However, Transformer models only accept *tensors* as input. If this is your first time hearing about tensors, you can think of them as NumPy arrays instead. A NumPy array can be a scalar (0D), a vector (1D), a matrix (2D), or have more dimensions. It's effectively a tensor; other ML frameworks' tensors behave similarly, and are usually as simple to instantiate as NumPy arrays.
You can use 🤗 Transformers without having to worry about which ML framework is used as a backend; it might be PyTorch or Flax for some models. However, Transformer models only accept *tensors* as input. If this is your first time hearing about tensors, you can think of them as NumPy arrays instead. A NumPy array can be a scalar (0D), a vector (1D), a matrix (2D), or have more dimensions. It's effectively a tensor; other ML frameworks' tensors behave similarly, and are usually as simple to instantiate as NumPy arrays.

To specify the type of tensors we want to get back (PyTorch, TensorFlow, or plain NumPy), we use the `return_tensors` argument:
To specify the type of tensors we want to get back (PyTorch or plain NumPy), we use the `return_tensors` argument:

{#if fw === 'pt'}
```python
raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
Expand All @@ -96,21 +74,9 @@ raw_inputs = [
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)
```
{:else}
```python
raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="tf")
print(inputs)
```
{/if}

Don't worry about padding and truncation just yet; we'll explain those later. The main things to remember here are that you can pass one sentence or a list of sentences, as well as specifying the type of tensors you want to get back (if no type is passed, you will get a list of lists as a result).

{#if fw === 'pt'}

Here's what the results look like as PyTorch tensors:

```python out
Expand All @@ -125,31 +91,11 @@ Here's what the results look like as PyTorch tensors:
])
}
```
{:else}

Here's what the results look like as TensorFlow tensors:

```python out
{
'input_ids': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
array([
[ 101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102],
[ 101, 1045, 5223, 2023, 2061, 2172, 999, 102, 0, 0, 0, 0, 0, 0, 0, 0]
], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
array([
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
], dtype=int32)>
}
```
{/if}

The output itself is a dictionary containing two keys, `input_ids` and `attention_mask`. `input_ids` contains two rows of integers (one for each sentence) that are the unique identifiers of the tokens in each sentence. We'll explain what the `attention_mask` is later in this chapter.

## Going through the model[[going-through-the-model]]

{#if fw === 'pt'}
We can download our pretrained model the same way we did with our tokenizer. 🤗 Transformers provides an `AutoModel` class which also has a `from_pretrained()` method:

```python
Expand All @@ -158,16 +104,6 @@ from transformers import AutoModel
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)
```
{:else}
We can download our pretrained model the same way we did with our tokenizer. 🤗 Transformers provides an `TFAutoModel` class which also has a `from_pretrained` method:

```python
from transformers import TFAutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModel.from_pretrained(checkpoint)
```
{/if}

In this code snippet, we have downloaded the same checkpoint we used in our pipeline before (it should actually have been cached already) and instantiated a model with it.

Expand All @@ -189,7 +125,6 @@ It is said to be "high dimensional" because of the last value. The hidden size c

We can see this if we feed the inputs we preprocessed to our model:

{#if fw === 'pt'}
```python
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)
Expand All @@ -198,16 +133,6 @@ print(outputs.last_hidden_state.shape)
```python out
torch.Size([2, 16, 768])
```
{:else}
```py
outputs = model(inputs)
print(outputs.last_hidden_state.shape)
```

```python out
(2, 16, 768)
```
{/if}

Note that the outputs of 🤗 Transformers models behave like `namedtuple`s or dictionaries. You can access the elements by attributes (like we did) or by key (`outputs["last_hidden_state"]`), or even by index if you know exactly where the thing you are looking for is (`outputs[0]`).

Expand Down Expand Up @@ -235,7 +160,6 @@ There are many different architectures available in 🤗 Transformers, with each
- `*ForTokenClassification`
- and others 🤗

{#if fw === 'pt'}
For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won't actually use the `AutoModel` class, but `AutoModelForSequenceClassification`:

```python
Expand All @@ -245,33 +169,16 @@ checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
```
{:else}
For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won't actually use the `TFAutoModel` class, but `TFAutoModelForSequenceClassification`:

```python
from transformers import TFAutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(inputs)
```
{/if}

Now if we look at the shape of our outputs, the dimensionality will be much lower: the model head takes as input the high-dimensional vectors we saw before, and outputs vectors containing two values (one per label):

```python
print(outputs.logits.shape)
```

{#if fw === 'pt'}
```python out
torch.Size([2, 2])
```
{:else}
```python out
(2, 2)
```
{/if}

Since we have just two sentences and two labels, the result we get from our model is of shape 2 x 2.

Expand All @@ -283,49 +190,24 @@ The values we get as output from our model don't necessarily make sense by thems
print(outputs.logits)
```

{#if fw === 'pt'}
```python out
tensor([[-1.5607, 1.6123],
[ 4.1692, -3.3464]], grad_fn=<AddmmBackward>)
```
{:else}
```python out
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-1.5606991, 1.6122842],
[ 4.169231 , -3.3464472]], dtype=float32)>
```
{/if}

Our model predicted `[-1.5607, 1.6123]` for the first sentence and `[ 4.1692, -3.3464]` for the second one. Those are not probabilities but *logits*, the raw, unnormalized scores outputted by the last layer of the model. To be converted to probabilities, they need to go through a [SoftMax](https://en.wikipedia.org/wiki/Softmax_function) layer (all 🤗 Transformers models output the logits, as the loss function for training will generally fuse the last activation function, such as SoftMax, with the actual loss function, such as cross entropy):

{#if fw === 'pt'}
```py
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
```
{:else}
```py
import tensorflow as tf

predictions = tf.math.softmax(outputs.logits, axis=-1)
print(predictions)
```
{/if}

{#if fw === 'pt'}
```python out
tensor([[4.0195e-02, 9.5980e-01],
[9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward>)
```
{:else}
```python out
tf.Tensor(
[[4.01951671e-02 9.59804833e-01]
[9.9945587e-01 5.4418424e-04]], shape=(2, 2), dtype=float32)
```
{/if}

Now we can see that the model predicted `[0.0402, 0.9598]` for the first sentence and `[0.9995, 0.0005]` for the second one. These are recognizable probability scores.

Expand Down
Loading
Loading