Mistral Language Models

Mistral is a family of decoder-only transformer models optimized for efficiency and throughput while preserving strong general performance. Architecturally, Mistral builds on the transformer decoder backbone with several key enhancements: it adopts grouped-query attention (GQA) for faster inference, replaces absolute positional encodings with sliding window attention for improved scalability, and utilizes SwiGLU activation functions. These models are well-suited for instruction following, reasoning, summarization, and coding tasks.

For more information on using our Mistral implementation, visit its model page in our documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral Language Models

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Mistral Language Models