Commit a0b8102
[Bugfix] Pad Marlin FP8 MoE weight dims to tile alignment under TP > 1
The Marlin kernel requires size_n % 64 == 0 (tile_n_size) and
size_k % 16 == 0 (tile_k_size). When tensor-parallel sharding splits
MoE weights across GPUs, per-rank dimensions can violate these constraints
and cause a crash at model load time on any GPU that falls back to the
Marlin FP8 MoE path (CC < 9.0: L40S, A100, A10G).
Example — Nemotron Nano 3 at TP=4 (intermediate_size=1856):
w13 gate+up: size_n = 464 per shard → 464 % 64 = 16 ✗
w2 down: size_k = 232 per shard → 232 % 16 = 8 ✗
This error is never triggered on Hopper+ (CC >= 9.0) because vLLM selects
native FP8 MoE kernels (CUTLASS/Triton) on those GPUs and never enters the
Marlin path.
Fix:
- Define MARLIN_TILE_N=64, MARLIN_TILE_K=16 and _pad_to_marlin_tile()
helper in marlin_utils_fp8.py
- repack_weight(): pad size_n/size_k to tile boundaries before
calling gptq_marlin_repack
- permute_scales(): pad scales to match padded size_n
- fused_marlin_moe.py _fused_marlin_moe(): import tile constants,
compute padded sizes, use them for GEMM calls, trim w13 padding
before activation, pad intermediate output before w2 GEMM
Padding with zeros is mathematically a no-op: zero weights and zero
inputs contribute nothing to GEMM outputs. For already-aligned
dimensions all padding amounts are zero and no operations are performed.
Tested on B200 with VLLM_TEST_FORCE_FP8_MARLIN=1 using Nemotron Nano 3
weight shapes (E=2, K=1024, N=232, W13_N=464).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent a9e532a commit a0b8102
File tree
2 files changed
+66
-8
lines changed- vllm/model_executor/layers
- fused_moe
- quantization/utils
2 files changed
+66
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
| |||
88 | 92 | | |
89 | 93 | | |
90 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
91 | 99 | | |
92 | 100 | | |
93 | 101 | | |
94 | 102 | | |
95 | 103 | | |
96 | | - | |
| 104 | + | |
97 | 105 | | |
98 | 106 | | |
99 | 107 | | |
| |||
106 | 114 | | |
107 | 115 | | |
108 | 116 | | |
109 | | - | |
| 117 | + | |
110 | 118 | | |
111 | 119 | | |
112 | 120 | | |
| |||
143 | 151 | | |
144 | 152 | | |
145 | 153 | | |
146 | | - | |
| 154 | + | |
147 | 155 | | |
148 | 156 | | |
149 | 157 | | |
150 | 158 | | |
151 | 159 | | |
152 | 160 | | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
153 | 165 | | |
154 | 166 | | |
155 | 167 | | |
156 | | - | |
| 168 | + | |
157 | 169 | | |
158 | 170 | | |
159 | 171 | | |
| |||
174 | 186 | | |
175 | 187 | | |
176 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
177 | 196 | | |
178 | 197 | | |
179 | 198 | | |
| |||
196 | 215 | | |
197 | 216 | | |
198 | 217 | | |
199 | | - | |
| 218 | + | |
200 | 219 | | |
201 | 220 | | |
202 | 221 | | |
| |||
Lines changed: 42 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
23 | 39 | | |
24 | 40 | | |
25 | 41 | | |
| |||
247 | 263 | | |
248 | 264 | | |
249 | 265 | | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
250 | 273 | | |
251 | 274 | | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
252 | 278 | | |
253 | | - | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
254 | 282 | | |
255 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
256 | 288 | | |
257 | 289 | | |
258 | 290 | | |
| |||
302 | 334 | | |
303 | 335 | | |
304 | 336 | | |
| 337 | + | |
305 | 338 | | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
306 | 342 | | |
307 | | - | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
308 | 347 | | |
309 | 348 | | |
310 | 349 | | |
| |||
0 commit comments