Replies: 1 comment
-
Yes, we are. This is how the transformer is trained or any other network (RNN) that tries to autoregressively generate a sequence of tokens. During training (and inference), the model predicts the next token given the current one + previous ones. Schematically: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am trying to understand these lines could you further elaborate what is the procedure of training the transformer here?
`# target includes all sequence elements (no need to handle first one
# differently because we are conditioning)
target = z_indices
Using the features and all of the indices what exactly are we trying to predict? Isn't the target all the z_indices that we are already giving to the transformer? Or are we just predicting the last z_index given the features and the previous z_indices?
Beta Was this translation helpful? Give feedback.
All reactions