Skip to content

Commit 5547b40

Browse files
labels and decoder_input_ids to Glossary (#7906)
* labels and decoder_input_ids to Glossary * Formatting fixes * Update docs/source/glossary.rst Co-authored-by: Sam Shleifer <[email protected]> * sam's comments Co-authored-by: Sam Shleifer <[email protected]>
1 parent f331251 commit 5547b40

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

docs/source/glossary.rst

+46
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,52 @@ positional embeddings.
218218
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models
219219
use other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
220220

221+
.. _labels:
222+
223+
Labels
224+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225+
226+
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
227+
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between
228+
its predictions and the expected value (the label).
229+
230+
These labels are different according to the model head, for example:
231+
232+
- For sequence classification models (e.g., :class:`~transformers.BertForSequenceClassification`), the model expects
233+
a tensor of dimension :obj:`(batch_size)` with each value of the batch corresponding to the expected label of the
234+
entire sequence.
235+
- For token classification models (e.g., :class:`~transformers.BertForTokenClassification`), the model expects
236+
a tensor of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each
237+
individual token.
238+
- For masked language modeling (e.g., :class:`~transformers.BertForMaskedLM`), the model expects
239+
a tensor of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each
240+
individual token: the labels being the token ID for the masked token, and values to be ignored for the rest (usually
241+
-100).
242+
- For sequence to sequence tasks,(e.g., :class:`~transformers.BartForConditionalGeneration`,
243+
:class:`~transformers.MBartForConditionalGeneration`), the model expects a tensor of dimension
244+
:obj:`(batch_size, tgt_seq_length)` with each value corresponding to the target sequences associated with each
245+
input sequence. During training, both `BART` and `T5` will make the appropriate `decoder_input_ids` and decoder
246+
attention masks internally. They usually do not need to be supplied. This does not apply to models leveraging the
247+
Encoder-Decoder framework.
248+
See the documentation of each model for more information on each specific model's labels.
249+
250+
The base models (e.g., :class:`~transformers.BertModel`) do not accept labels, as these are the base transformer models,
251+
simply outputting features.
252+
253+
.. _decoder-input-ids:
254+
255+
Decoder input IDs
256+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
257+
258+
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder.
259+
These inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually
260+
built in a way specific to each model.
261+
262+
Most encoder-decoder models (BART, T5) create their :obj:`decoder_input_ids` on their own from the :obj:`labels`.
263+
In such models, passing the :obj:`labels` is the preferred way to handle training.
264+
265+
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
266+
221267
.. _feed-forward-chunking:
222268

223269
Feed Forward Chunking

0 commit comments

Comments
 (0)