Skip to content

Commit a20bb45

Browse files
authored
Update README.md
1 parent c64e9ef commit a20bb45

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,11 @@ Still WIP and in very early stage. A tutorial on LLM serving using MLX for syste
66
is solely (almost!) based on MLX array/matrix APIs without any high-level neural network APIs, so that we
77
can build the model serving infrastructure from scratch and dig into the optimizations.
88

9-
The goal is to learn the techniques behind efficiently serving an LLM model (i.e., Qwen2 models).
9+
The goal is to learn the techniques behind efficiently serving a large language model (i.e., Qwen2 models).
10+
11+
Why MLX: nowadays it's easier to get a macOS-based local development environment than setting up an NVIDIA GPU.
12+
13+
Why Qwen2: this was the first LLM I've interacted with -- it's the go-to example in the vllm documentation. I spent some time looking at the vllm source code and built some knowledge around it.
1014

1115
## Book
1216

0 commit comments

Comments
 (0)