Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
__init__.py	__init__.py
model.py	model.py
readme.md	readme.md
run.py	run.py

Name

Last commit message

Last commit date

Falcon

The Falcon series consists of causal decoder-only transformer models with 7B, 40B, and 180B parameters, developed by the Technology Innovation Institute (TII). The models follow an optimized GPT-style architecture with key changes for efficient scaling and throughput:

Parallel attention and MLP layers within transformer blocks.
Rotary positional embeddings (RoPE) and multigroup attention (a generalization of multiquery attention) for faster inference and better tensor parallelism.
GELU activations, no dropout, and z-loss regularization for stable training.
Context length of 2,048 tokens and a 65K vocabulary.

For more information on using our Falcon implementation, visit its model page in our documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Falcon

FilesExpand file tree

falcon

Directory actions

More options

Directory actions

More options

Latest commit

History

falcon

Folders and files

parent directory

readme.md

Falcon