MAX inference server

MAX is a high-performance inference server that provides an OpenAI-compatible endpoint for large language models (LLMs) locally or in the cloud.

To start your own endpoint with just a few commands, check out our quickstart guide.