We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
As described in the title. Torch by default uses batch_first=False for the TransformerEncoderLayer, resulting in high training loss and test error.
batch_first=False
TransformerEncoderLayer
Using default batch_first=False
Passing batch_first=True
batch_first=True