@@ -218,6 +218,52 @@ positional embeddings.
218
218
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1] ``. Some models
219
219
use other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
220
220
221
+ .. _labels :
222
+
223
+ Labels
224
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225
+
226
+ The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
227
+ should be the expected prediction of the model: it will use the standard loss in order to compute the loss between
228
+ its predictions and the expected value (the label).
229
+
230
+ These labels are different according to the model head, for example:
231
+
232
+ - For sequence classification models (e.g., :class: `~transformers.BertForSequenceClassification `), the model expects
233
+ a tensor of dimension :obj: `(batch_size) ` with each value of the batch corresponding to the expected label of the
234
+ entire sequence.
235
+ - For token classification models (e.g., :class: `~transformers.BertForTokenClassification `), the model expects
236
+ a tensor of dimension :obj: `(batch_size, seq_length) ` with each value corresponding to the expected label of each
237
+ individual token.
238
+ - For masked language modeling (e.g., :class: `~transformers.BertForMaskedLM `), the model expects
239
+ a tensor of dimension :obj: `(batch_size, seq_length) ` with each value corresponding to the expected label of each
240
+ individual token: the labels being the token ID for the masked token, and values to be ignored for the rest (usually
241
+ -100).
242
+ - For sequence to sequence tasks,(e.g., :class: `~transformers.BartForConditionalGeneration `,
243
+ :class: `~transformers.MBartForConditionalGeneration `), the model expects a tensor of dimension
244
+ :obj: `(batch_size, tgt_seq_length) ` with each value corresponding to the target sequences associated with each
245
+ input sequence. During training, both `BART ` and `T5 ` will make the appropriate `decoder_input_ids ` and decoder
246
+ attention masks internally. They usually do not need to be supplied. This does not apply to models leveraging the
247
+ Encoder-Decoder framework.
248
+ See the documentation of each model for more information on each specific model's labels.
249
+
250
+ The base models (e.g., :class: `~transformers.BertModel `) do not accept labels, as these are the base transformer models,
251
+ simply outputting features.
252
+
253
+ .. _decoder-input-ids :
254
+
255
+ Decoder input IDs
256
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
257
+
258
+ This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder.
259
+ These inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually
260
+ built in a way specific to each model.
261
+
262
+ Most encoder-decoder models (BART, T5) create their :obj: `decoder_input_ids ` on their own from the :obj: `labels `.
263
+ In such models, passing the :obj: `labels ` is the preferred way to handle training.
264
+
265
+ Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
266
+
221
267
.. _feed-forward-chunking :
222
268
223
269
Feed Forward Chunking
0 commit comments