huggingface · burtenshaw · May 23, 2025 · May 5, 2025 · May 5, 2025 · May 5, 2025
diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
@@ -46,6 +46,8 @@
   - local: chapter2/7
     title: Basic usage completed!
   - local: chapter2/8
+    title: Optimized Inference Deployment
+  - local: chapter2/9
     title: End-of-chapter quiz
     quiz: 2
 

diff --git a/chapters/en/chapter2/1.mdx b/chapters/en/chapter2/1.mdx
@@ -10,7 +10,7 @@ As you saw in [Chapter 1](/course/chapter1), Transformer models are usually very
 The 🤗 Transformers library was created to solve this problem. Its goal is to provide a single API through which any Transformer model can be loaded, trained, and saved. The library's main features are:
 
 - **Ease of use**: Downloading, loading, and using a state-of-the-art NLP model for inference can be done in just two lines of code.
-- **Flexibility**: At their core, all models are simple PyTorch `nn.Module` or TensorFlow `tf.keras.Model` classes and can be handled like any other models in their respective machine learning (ML) frameworks.
+- **Flexibility**: At their core, all models are simple PyTorch `nn.Module` classes and can be handled like any other models in their respective machine learning (ML) frameworks.
 - **Simplicity**: Hardly any abstractions are made across the library. The "All in one file" is a core concept: a model's forward pass is entirely defined in a single file, so that the code itself is understandable and hackable.
 
 This last feature makes 🤗 Transformers quite different from other ML libraries. The models are not built on modules 

diff --git a/chapters/en/chapter2/2.mdx b/chapters/en/chapter2/2.mdx
@@ -2,35 +2,14 @@
 
 # Behind the pipeline[[behind-the-pipeline]]
 
-{#if fw === 'pt'}
-
 <CourseFloatingBanner chapter={2}
   classNames="absolute z-10 right-0 top-0"
   notebooks={[
     {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_pt.ipynb"},
     {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_pt.ipynb"},
 ]} />
 
-{:else}
-
-<CourseFloatingBanner chapter={2}
-  classNames="absolute z-10 right-0 top-0"
-  notebooks={[
-    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_tf.ipynb"},
-    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_tf.ipynb"},
-]} />
-
-{/if}
-
-<Tip>
-This is the first section where the content is slightly different depending on whether you use PyTorch or TensorFlow. Toggle the switch on top of the title to select the platform you prefer!
-</Tip>
-
-{#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
-{:else}
-<Youtube id="wVN12smEvqg"/>
-{/if}
 
 Let's start with a complete example, taking a look at what happened behind the scenes when we executed the following code in [Chapter 1](/course/chapter1):
 
@@ -83,11 +62,10 @@ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
 
 Once we have the tokenizer, we can directly pass our sentences to it and we'll get back a dictionary that's ready to feed to our model! The only thing left to do is to convert the list of input IDs to tensors.
 
-You can use 🤗 Transformers without having to worry about which ML framework is used as a backend; it might be PyTorch or TensorFlow, or Flax for some models. However, Transformer models only accept *tensors* as input. If this is your first time hearing about tensors, you can think of them as NumPy arrays instead. A NumPy array can be a scalar (0D), a vector (1D), a matrix (2D), or have more dimensions. It's effectively a tensor; other ML frameworks' tensors behave similarly, and are usually as simple to instantiate as NumPy arrays.
+You can use 🤗 Transformers without having to worry about which ML framework is used as a backend; it might be PyTorch or Flax for some models. However, Transformer models only accept *tensors* as input. If this is your first time hearing about tensors, you can think of them as NumPy arrays instead. A NumPy array can be a scalar (0D), a vector (1D), a matrix (2D), or have more dimensions. It's effectively a tensor; other ML frameworks' tensors behave similarly, and are usually as simple to instantiate as NumPy arrays.
 
-To specify the type of tensors we want to get back (PyTorch, TensorFlow, or plain NumPy), we use the `return_tensors` argument:
+To specify the type of tensors we want to get back (PyTorch or plain NumPy), we use the `return_tensors` argument:
 
-{#if fw === 'pt'}
 ```python
 raw_inputs = [
     "I've been waiting for a HuggingFace course my whole life.",
@@ -96,21 +74,9 @@ raw_inputs = [
 inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
 print(inputs)
 ```
-{:else}
-```python
-raw_inputs = [
-    "I've been waiting for a HuggingFace course my whole life.",
-    "I hate this so much!",
-]
-inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="tf")
-print(inputs)
-```
-{/if}
 
 Don't worry about padding and truncation just yet; we'll explain those later. The main things to remember here are that you can pass one sentence or a list of sentences, as well as specifying the type of tensors you want to get back (if no type is passed, you will get a list of lists as a result).
 
-{#if fw === 'pt'}
-
 Here's what the results look like as PyTorch tensors:
 
 ```python out
@@ -125,31 +91,11 @@ Here's what the results look like as PyTorch tensors:
     ])
 }
 ```
-{:else}
-
-Here's what the results look like as TensorFlow tensors:
-
-```python out
-{
-    'input_ids': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
-        array([
-            [  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,  2026,  2878,  2166,  1012,   102],
-            [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,     0,     0,     0,     0,     0,     0]
-        ], dtype=int32)>, 
-    'attention_mask': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
-        array([
-            [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
-            [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
-        ], dtype=int32)>
-}
-```
-{/if}
 
 The output itself is a dictionary containing two keys, `input_ids` and `attention_mask`. `input_ids` contains two rows of integers (one for each sentence) that are the unique identifiers of the tokens in each sentence. We'll explain what the `attention_mask` is later in this chapter. 
 
 ## Going through the model[[going-through-the-model]]
 
-{#if fw === 'pt'}
 We can download our pretrained model the same way we did with our tokenizer. 🤗 Transformers provides an `AutoModel` class which also has a `from_pretrained()` method:
 
 ```python
@@ -158,16 +104,6 @@ from transformers import AutoModel
 checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
 model = AutoModel.from_pretrained(checkpoint)
 ```
-{:else}
-We can download our pretrained model the same way we did with our tokenizer. 🤗 Transformers provides an `TFAutoModel` class which also has a `from_pretrained` method:
-
-```python
-from transformers import TFAutoModel
-
-checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
-model = TFAutoModel.from_pretrained(checkpoint)
-```
-{/if}
 
 In this code snippet, we have downloaded the same checkpoint we used in our pipeline before (it should actually have been cached already) and instantiated a model with it.
 
@@ -189,7 +125,6 @@ It is said to be "high dimensional" because of the last value. The hidden size c
 
 We can see this if we feed the inputs we preprocessed to our model:
 
-{#if fw === 'pt'}
 ```python
 outputs = model(**inputs)
 print(outputs.last_hidden_state.shape)
@@ -198,16 +133,6 @@ print(outputs.last_hidden_state.shape)
 ```python out
 torch.Size([2, 16, 768])
 ```
-{:else}
-```py
-outputs = model(inputs)
-print(outputs.last_hidden_state.shape)
-```
-
-```python out
-(2, 16, 768)
-```
-{/if}
 
 Note that the outputs of 🤗 Transformers models behave like `namedtuple`s or dictionaries. You can access the elements by attributes (like we did) or by key (`outputs["last_hidden_state"]`), or even by index if you know exactly where the thing you are looking for is (`outputs[0]`).
 
@@ -235,7 +160,6 @@ There are many different architectures available in 🤗 Transformers, with each
 - `*ForTokenClassification`
 - and others 🤗
 
-{#if fw === 'pt'}
 For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won't actually use the `AutoModel` class, but `AutoModelForSequenceClassification`:
 
 ```python
@@ -245,33 +169,16 @@ checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
 model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
 outputs = model(**inputs)
 ```
-{:else}
-For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won't actually use the `TFAutoModel` class, but `TFAutoModelForSequenceClassification`:
-
-```python
-from transformers import TFAutoModelForSequenceClassification
-
-checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
-model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
-outputs = model(inputs)
-```
-{/if}
 
 Now if we look at the shape of our outputs, the dimensionality will be much lower: the model head takes as input the high-dimensional vectors we saw before, and outputs vectors containing two values (one per label):
 
 ```python
 print(outputs.logits.shape)
 ```
 
-{#if fw === 'pt'}
 ```python out
 torch.Size([2, 2])
 ```
-{:else}
-```python out
-(2, 2)
-```
-{/if}
 
 Since we have just two sentences and two labels, the result we get from our model is of shape 2 x 2.
 
@@ -283,49 +190,24 @@ The values we get as output from our model don't necessarily make sense by thems
 print(outputs.logits)
 ```
 
-{#if fw === 'pt'}
 ```python out
 tensor([[-1.5607,  1.6123],
         [ 4.1692, -3.3464]], grad_fn=<AddmmBackward>)
 ```
-{:else}
-```python out
-<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
-    array([[-1.5606991,  1.6122842],
-           [ 4.169231 , -3.3464472]], dtype=float32)>
-```
-{/if}
 
 Our model predicted `[-1.5607, 1.6123]` for the first sentence and `[ 4.1692, -3.3464]` for the second one. Those are not probabilities but *logits*, the raw, unnormalized scores outputted by the last layer of the model. To be converted to probabilities, they need to go through a [SoftMax](https://en.wikipedia.org/wiki/Softmax_function) layer (all 🤗 Transformers models output the logits, as the loss function for training will generally fuse the last activation function, such as SoftMax, with the actual loss function, such as cross entropy):
 
-{#if fw === 'pt'}
 ```py
 import torch
 
 predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
 print(predictions)
 ```
-{:else}
-```py
-import tensorflow as tf
-
-predictions = tf.math.softmax(outputs.logits, axis=-1)
-print(predictions)
-```
-{/if}
 
-{#if fw === 'pt'}
 ```python out
 tensor([[4.0195e-02, 9.5980e-01],
         [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward>)
 ```
-{:else}
-```python out
-tf.Tensor(
-[[4.01951671e-02 9.59804833e-01]
- [9.9945587e-01 5.4418424e-04]], shape=(2, 2), dtype=float32)
-```
-{/if}
 
 Now we can see that the model predicted `[0.0402, 0.9598]` for the first sentence and `[0.9995,  0.0005]` for the second one. These are recognizable probability scores.