Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 587 Bytes

File metadata and controls

10 lines (6 loc) · 587 Bytes

Quick Start Recipe for Llama 3.1 on vLLM

Introduction

This quick start recipe provides step-by-step instructions for running the Llama 3.1 Instruct model using vLLM. The recipe is intended for developers and practitioners seeking high-throughput or low-latency inference on the targeted accelerated stack.

TPU Deployment