File tree 1 file changed +9
-7
lines changed
1 file changed +9
-7
lines changed Original file line number Diff line number Diff line change @@ -28,17 +28,19 @@ You may join skyzh's Discord server and study with the tiny-llm community.
28
28
| 1.4 | RMSNorm and MLP | ✅ | 🚧 | 🚧 |
29
29
| 1.5 | Transformer Block | ✅ | 🚧 | 🚧 |
30
30
| 1.6 | Load the Model | ✅ | 🚧 | 🚧 |
31
- | 1.7 | Generate Responses | ✅ | ✅ | 🚧 |
31
+ | 1.7 | Generate Responses (aka Decoding) | ✅ | ✅ | 🚧 |
32
32
| 2.1 | KV Cache | ✅ | 🚧 | 🚧 |
33
33
| 2.2 | Quantized Matmul and Linear (CPU) | 🚧 | 🚧 | 🚧 |
34
34
| 2.3 | Quantized Matmul and Linear (Metal) | 🚧 | 🚧 | 🚧 |
35
- | 2.4 | Attention Kernel | 🚧 | 🚧 | 🚧 |
36
- | 2.5 | Softmax Kernel | 🚧 | 🚧 | 🚧 |
37
- | 2.6 | Prompt Cache / Multiple Requests | 🚧 | 🚧 | 🚧 |
38
- | 2.7 | Benchmarking | 🚧 | 🚧 | 🚧 |
39
- | 3.1 | API Server | 🚧 | 🚧 | 🚧 |
35
+ | 2.4 | Attention and Softmax Kernels | 🚧 | 🚧 | 🚧 |
36
+ | 2.5 | Flash Attention | 🚧 | 🚧 | 🚧 |
37
+ | 2.6 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 |
38
+ | 2.7 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 |
39
+ | 3.1 | Streaming API Server | 🚧 | 🚧 | 🚧 |
40
40
| 3.2 | Continuous Batching | 🚧 | 🚧 | 🚧 |
41
-
41
+ | 3.3 | Speculative Decoding | 🚧 | 🚧 | 🚧 |
42
+ | 3.4 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 |
43
+ | 3.5 | AI Agent | 🚧 | 🚧 | 🚧 |
42
44
43
45
<!--
44
46
You can’t perform that action at this time.
0 commit comments