Skip to content

Commit 966d83e

Browse files
committed
update roadmap
Signed-off-by: Alex Chi Z <[email protected]>
1 parent bfdb834 commit 966d83e

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

README.md

+9-7
Original file line numberDiff line numberDiff line change
@@ -28,17 +28,19 @@ You may join skyzh's Discord server and study with the tiny-llm community.
2828
| 1.4 | RMSNorm and MLP || 🚧 | 🚧 |
2929
| 1.5 | Transformer Block || 🚧 | 🚧 |
3030
| 1.6 | Load the Model || 🚧 | 🚧 |
31-
| 1.7 | Generate Responses ||| 🚧 |
31+
| 1.7 | Generate Responses (aka Decoding) ||| 🚧 |
3232
| 2.1 | KV Cache || 🚧 | 🚧 |
3333
| 2.2 | Quantized Matmul and Linear (CPU) | 🚧 | 🚧 | 🚧 |
3434
| 2.3 | Quantized Matmul and Linear (Metal) | 🚧 | 🚧 | 🚧 |
35-
| 2.4 | Attention Kernel | 🚧 | 🚧 | 🚧 |
36-
| 2.5 | Softmax Kernel | 🚧 | 🚧 | 🚧 |
37-
| 2.6 | Prompt Cache / Multiple Requests | 🚧 | 🚧 | 🚧 |
38-
| 2.7 | Benchmarking | 🚧 | 🚧 | 🚧 |
39-
| 3.1 | API Server | 🚧 | 🚧 | 🚧 |
35+
| 2.4 | Attention and Softmax Kernels | 🚧 | 🚧 | 🚧 |
36+
| 2.5 | Flash Attention | 🚧 | 🚧 | 🚧 |
37+
| 2.6 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 |
38+
| 2.7 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 |
39+
| 3.1 | Streaming API Server | 🚧 | 🚧 | 🚧 |
4040
| 3.2 | Continuous Batching | 🚧 | 🚧 | 🚧 |
41-
41+
| 3.3 | Speculative Decoding | 🚧 | 🚧 | 🚧 |
42+
| 3.4 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 |
43+
| 3.5 | AI Agent | 🚧 | 🚧 | 🚧 |
4244

4345
<!--
4446

0 commit comments

Comments
 (0)