Quick Start Recipe for Llama 3.1 on vLLM

Introduction

This quick start recipe provides step-by-step instructions for running the Llama 3.1 Instruct model using vLLM. The recipe is intended for developers and practitioners seeking high-throughput or low-latency inference on the targeted accelerated stack.

TPU Deployment

Llama3.1-70B on Trillium (v6e)
Llama3.1-8B on Trillium (v6e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Recipe for Llama 3.1 on vLLM

Introduction

TPU Deployment

FilesExpand file tree

Llama3.1.md

Latest commit

History

Llama3.1.md

File metadata and controls

Quick Start Recipe for Llama 3.1 on vLLM

Introduction

TPU Deployment