Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 736 Bytes

File metadata and controls

5 lines (3 loc) · 736 Bytes

Bidirectional Transformers for Language Understanding (BERT)

Bidirectional Transformers for Language Understanding (BERT) is an encoder-only transformer-based model designed for natural language understanding. This directory contains implementations of the BERT model. It uses a stack of transformer blocks with multi-head attention followed by a multi-layer perceptron feed-forward network. We support removing next-sentence-prediction (NSP) loss from BERT training processing with only masked-language-modeling (MLM) loss.

For more information on using our BERT implementation, visit its model page in our documentation.