Skip to content

Commit 7268359

Browse files
authored
Merge pull request #895 from arjunaskykok/fix_tokenizer_param_processing_class
replace deprecated parameter tokenizer with processing_class in chapt…
2 parents e510a17 + f583964 commit 7268359

File tree

1 file changed

+4
-4
lines changed
  • chapters/en/chapter3

1 file changed

+4
-4
lines changed

chapters/en/chapter3/3.mdx

+4-4
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_label
5858

5959
You will notice that unlike in [Chapter 2](/course/chapter2), you get a warning after instantiating this pretrained model. This is because BERT has not been pretrained on classifying pairs of sentences, so the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added instead. The warnings indicate that some weights were not used (the ones corresponding to the dropped pretraining head) and that some others were randomly initialized (the ones for the new head). It concludes by encouraging you to train the model, which is exactly what we are going to do now.
6060

61-
Once we have our model, we can define a `Trainer` by passing it all the objects constructed up to now — the `model`, the `training_args`, the training and validation datasets, our `data_collator`, and our `tokenizer`:
61+
Once we have our model, we can define a `Trainer` by passing it all the objects constructed up to now — the `model`, the `training_args`, the training and validation datasets, our `data_collator`, and our `processing_class` (e.g., a tokenizer, feature extractor, or processor):
6262

6363
```py
6464
from transformers import Trainer
@@ -69,11 +69,11 @@ trainer = Trainer(
6969
train_dataset=tokenized_datasets["train"],
7070
eval_dataset=tokenized_datasets["validation"],
7171
data_collator=data_collator,
72-
tokenizer=tokenizer,
72+
processing_class=tokenizer,
7373
)
7474
```
7575

76-
Note that when you pass the `tokenizer` as we did here, the default `data_collator` used by the `Trainer` will be a `DataCollatorWithPadding` as defined previously, so you can skip the line `data_collator=data_collator` in this call. It was still important to show you this part of the processing in section 2!
76+
Note that when you pass a tokenizer as the `processing_class`, as we did here, the default `data_collator` used by the `Trainer` will be a `DataCollatorWithPadding` if the `processing_class` is a tokenizer or feature extractor, so you can skip the line `data_collator=data_collator` in this call. It was still important to show you this part of the processing in section 2!
7777

7878
To fine-tune the model on our dataset, we just have to call the `train()` method of our `Trainer`:
7979

@@ -147,7 +147,7 @@ trainer = Trainer(
147147
train_dataset=tokenized_datasets["train"],
148148
eval_dataset=tokenized_datasets["validation"],
149149
data_collator=data_collator,
150-
tokenizer=tokenizer,
150+
processing_class=tokenizer,
151151
compute_metrics=compute_metrics,
152152
)
153153
```

0 commit comments

Comments
 (0)