Skip to content

Commit 2f2196d

Browse files
committed
update roadmap
Signed-off-by: Alex Chi Z <[email protected]>
1 parent 235f8be commit 2f2196d

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

README.md

+11-10
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,18 @@ You may join skyzh's Discord server and study with the tiny-llm community.
3030
| 1.6 | Load the Model || 🚧 | 🚧 |
3131
| 1.7 | Generate Responses (aka Decoding) ||| 🚧 |
3232
| 2.1 | KV Cache || 🚧 | 🚧 |
33-
| 2.2 | Quantized Matmul and Linear (CPU) | 🚧 | 🚧 | 🚧 |
34-
| 2.3 | Quantized Matmul and Linear (Metal) | 🚧 | 🚧 | 🚧 |
35-
| 2.4 | Attention and Softmax Kernels | 🚧 | 🚧 | 🚧 |
36-
| 2.5 | Flash Attention | 🚧 | 🚧 | 🚧 |
37-
| 2.6 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 |
38-
| 2.7 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 |
39-
| 3.1 | Streaming API Server | 🚧 | 🚧 | 🚧 |
40-
| 3.2 | Continuous Batching | 🚧 | 🚧 | 🚧 |
41-
| 3.3 | Speculative Decoding | 🚧 | 🚧 | 🚧 |
42-
| 3.4 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 |
33+
| 2.2 | Quantized Matmul and Linear - Part 1 | | 🚧 | 🚧 |
34+
| 2.3 | Quantized Matmul and Linear - Part 2 | 🚧 | 🚧 | 🚧 |
35+
| 2.4 | Flash Attention and Other Kernels | 🚧 | 🚧 | 🚧 |
36+
| 2.5 | Continuous Batching | 🚧 | 🚧 | 🚧 |
37+
| 2.6 | Speculative Decoding | 🚧 | 🚧 | 🚧 |
38+
| 2.7 | Prompt/Prefix Cache | 🚧 | 🚧 | 🚧 |
39+
| 3.1 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 |
40+
| 3.2 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 |
41+
| 3.3 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 |
42+
| 3.4 | Parallelism | 🚧 | 🚧 | 🚧 |
4343
| 3.5 | AI Agent | 🚧 | 🚧 | 🚧 |
44+
| 3.6 | Streaming API Server | 🚧 | 🚧 | 🚧 |
4445

4546
<!--
4647

0 commit comments

Comments
 (0)