@@ -30,17 +30,18 @@ You may join skyzh's Discord server and study with the tiny-llm community.
30
30
| 1.6 | Load the Model | ✅ | 🚧 | 🚧 |
31
31
| 1.7 | Generate Responses (aka Decoding) | ✅ | ✅ | 🚧 |
32
32
| 2.1 | KV Cache | ✅ | 🚧 | 🚧 |
33
- | 2.2 | Quantized Matmul and Linear (CPU) | 🚧 | 🚧 | 🚧 |
34
- | 2.3 | Quantized Matmul and Linear (Metal) | 🚧 | 🚧 | 🚧 |
35
- | 2.4 | Attention and Softmax Kernels | 🚧 | 🚧 | 🚧 |
36
- | 2.5 | Flash Attention | 🚧 | 🚧 | 🚧 |
37
- | 2.6 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 |
38
- | 2.7 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 |
39
- | 3.1 | Streaming API Server | 🚧 | 🚧 | 🚧 |
40
- | 3.2 | Continuous Batching | 🚧 | 🚧 | 🚧 |
41
- | 3.3 | Speculative Decoding | 🚧 | 🚧 | 🚧 |
42
- | 3.4 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 |
33
+ | 2.2 | Quantized Matmul and Linear - Part 1 | ✅ | 🚧 | 🚧 |
34
+ | 2.3 | Quantized Matmul and Linear - Part 2 | 🚧 | 🚧 | 🚧 |
35
+ | 2.4 | Flash Attention and Other Kernels | 🚧 | 🚧 | 🚧 |
36
+ | 2.5 | Continuous Batching | 🚧 | 🚧 | 🚧 |
37
+ | 2.6 | Speculative Decoding | 🚧 | 🚧 | 🚧 |
38
+ | 2.7 | Prompt/Prefix Cache | 🚧 | 🚧 | 🚧 |
39
+ | 3.1 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 |
40
+ | 3.2 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 |
41
+ | 3.3 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 |
42
+ | 3.4 | Parallelism | 🚧 | 🚧 | 🚧 |
43
43
| 3.5 | AI Agent | 🚧 | 🚧 | 🚧 |
44
+ | 3.6 | Streaming API Server | 🚧 | 🚧 | 🚧 |
44
45
45
46
<!--
46
47
0 commit comments