Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
images	images
README.md	README.md
__init__.py	__init__.py
model.py	model.py
run.py	run.py

Name

Last commit message

Last commit date

GPT-3 Language Models

GPT-3 is a decoder-only transformer language model architecture designed for large-scale autoregressive pretraining. It extends GPT-2 with significantly more parameters (ranging from 1.3B to 175B) and introduces architectural refinements such as sparse attention layers, used in alternating blocks to reduce compute costs during training. However, this implementation uses the GPT-2-style dense attention in all layers.

Training occurs on next-token prediction using large text corpora like The PILE, with inputs represented as token sequences padded and masked to a fixed maximum sequence length.

For more information on using our GPT-3 implementation, visit its model page in our documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

GPT-3 Language Models

FilesExpand file tree

gpt3

Directory actions

More options

Directory actions

More options

Latest commit

History

gpt3

Folders and files

parent directory

README.md

GPT-3 Language Models