Commit 3600f06
feat(speculative): n-gram drafter stacked on MTP for Qwen3.6-35B-A3B
- New per-request n-gram speculative drafter with <think> and <tool_call>
state machines, adaptive K based on n-gram match confidence, hybrid
verify (append MTP draft after n-gram tail), per-request self-tuning,
and global auto-disable when MTP is strong and n-gram is weak.
- Auto-enabled by the qwen3.6-35b preset and the new qwen3.6-35b-8bit
preset. +18% throughput on agentic reasoning + tool-use workloads vs.
MTP-only.
- New qwen3.6-35b-8bit alias routing to
samuelfaj/Qwen3.6-35B-A3B-8bit-MTPLX-Optimized-Speed with full preset
parity (MTP, n-gram, port 8010, tool/reasoning parsers, temps).
- Structured CoT grammar plumbing (structured_cot.gbnf, lcb_plan.gbnf).
- Scheduler, server, TUI, and metrics middleware updates to surface and
control n-gram drafting per request.
- Test coverage for drafter, structured CoT, and CLI preset parity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 70ac799 commit 3600f06
18 files changed
Lines changed: 2579 additions & 39 deletions
File tree
- tests
- vllm_mlx
- api
- grammars
- engine
- middleware
- speculative
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
51 | 64 | | |
52 | 65 | | |
53 | 66 | | |
| |||
81 | 94 | | |
82 | 95 | | |
83 | 96 | | |
| 97 | + | |
84 | 98 | | |
85 | 99 | | |
| 100 | + | |
| 101 | + | |
86 | 102 | | |
87 | 103 | | |
88 | 104 | | |
| |||
147 | 163 | | |
148 | 164 | | |
149 | 165 | | |
| 166 | + | |
150 | 167 | | |
151 | 168 | | |
152 | 169 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
| 7 | + | |
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
| |||
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
35 | 52 | | |
36 | 53 | | |
37 | 54 | | |
| |||
144 | 161 | | |
145 | 162 | | |
146 | 163 | | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
0 commit comments