You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-1
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,11 @@ Still WIP and in very early stage. A tutorial on LLM serving using MLX for syste
6
6
is solely (almost!) based on MLX array/matrix APIs without any high-level neural network APIs, so that we
7
7
can build the model serving infrastructure from scratch and dig into the optimizations.
8
8
9
-
The goal is to learn the techniques behind efficiently serving an LLM model (i.e., Qwen2 models).
9
+
The goal is to learn the techniques behind efficiently serving a large language model (i.e., Qwen2 models).
10
+
11
+
Why MLX: nowadays it's easier to get a macOS-based local development environment than setting up an NVIDIA GPU.
12
+
13
+
Why Qwen2: this was the first LLM I've interacted with -- it's the go-to example in the vllm documentation. I spent some time looking at the vllm source code and built some knowledge around it.
0 commit comments