The gpt-2 model is a one of Generative Pre-trained Transformer (GPT) model family, pre-trained on a very large corpus of English data in a self-supervised fashion. The GPT architecture implements a deep neural network, specifically a transformer model, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.
More details provided in the paper, repository and model card.
| Metric | Value |
|---|---|
| Type | Text Prediction |
| GFlops | 293.0489 |
| MParams | 175.6203 |
| Source framework | PyTorch* |
GFlops calculated for 1, 1024 input shape, that is suitable for long context
Perplexity obtained on WikiText-2 raw character level data dataset for converted model.
| Metric | Value |
|---|---|
| Perplexity | 29.00% |
Token ids, name: input, dynamic shape in the format B, L, where:
B- batch sizeL- sequence length
Token ids, name: input, dynamic shape in the format B, L, where:
B- batch sizeL- sequence length
Prediction scores of language modeling head, name: output, dynamic shape B, L, 50257 in the format B, L, S, where:
B- batch sizeL- sequence lengthS- vocab size
Prediction scores of language modeling head, name: output, dynamic shape B, L, 50257 in the format B, L, S, where:
B- batch sizeL- sequence lengthS- vocab size
You can download models and if necessary convert them into OpenVINO™ IR format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
omz_downloader --name <model_name>
An example of using the Model Converter:
omz_converter --name <model_name>
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
The original model is distributed under the mit License.