Commit 2355ea0
committed
[None][chore] ltx2: address CR review, simplify PE cache plumbing, fix kernel UB
CR-flagged fixes:
- fusedDiTQKNormRopeKernel.cu: add trailing __syncthreads() in reduce_partial,
preventing race when warp_sums[] is reused for Q->K reductions (CR PR NVIDIA#13985
thread 1).
- fusedDiTQKNormRopeKernel.cu + fusedDiTSplitQKNormRopeKernel.cu: use
__activemask() instead of 0xffffffff for the rotate-half __shfl_xor_sync,
which avoided UB for small num_heads*HEAD_DIM where the surrounding chunk
loop has partial-warp early-exit (CR thread 2).
PE cache plumbing simplification (data flow):
- Drop the 4 *_pe_2d duplicate fields in TextCache; the single *_pe field now
holds the form the consumer expects (2D [T_local, H*D] contiguous when
fuse_qk_norm_rope=True, 4D [B, T_local, H, D] otherwise).
- Revert ltx2_core/transformer_args.py to upstream (drop the two _2d fields
+ two _2d kwargs that C8 had added to the upstream-mirrored file).
- LTX2Attention now explicitly sets fuse_qk_norm_rope=True (the base class
default for qk_norm_mode="full" was False, but the LTX-2 forward path
ignored the flag); forward() now actually gates on it.
- _shard_transformer_args drops the per-step _shard_pe — PE is sharded
one-time in prepare_text_cache via _make_pe_local (renamed from
_make_pe_2d_local; now produces 2D or 4D based on the fuse flag).
- BasicAVTransformerBlock's 6 'pe=*._2d or *._4d' fallback expressions
collapse to a single 'pe=*._pe' reference.
- _forward_unfused gains a pe.ndim assert so the naive eager path fails
loud if anyone passes the fused 2D form.
- pipeline_ltx2 cuda-graph clone/copy halved (10 -> 6 calls per TextCache).
Test reorg:
- Move test_fused_dit_split_qk_norm_rope.py + test_fused_dit_split_norm.py
from parallel/ to parallel_hw_agnostic/. Extend the packed test file with
full-dim cells covering LTX-2 self-attn shapes (T=12288 H=32 D=128 +
T=504 H=32 D=64, including the broadcast-over-B path).
Verification:
- 159 unit tests pass (packed + split + norm-only across fp32/bf16 cos).
- 1-GPU 40-step LTX-2 e2e (gs=3.0): raw video sha256 bit-identical to the
pre-cleanup HEAD (99cc34517b19e3e12fb66ccc439b4c5f7b2575cf862e627fb504e1fdcc120755).
Signed-off-by: Yiyun Lu <55233584+luyiyun1021@users.noreply.github.com>1 parent 5feedce commit 2355ea0
11 files changed
Lines changed: 362 additions & 126 deletions
File tree
- cpp/tensorrt_llm
- kernels
- thop
- tensorrt_llm/_torch/visual_gen
- models/ltx2
- ltx2_core
- modules
- tests/unittest/_torch/thop/parallel_hw_agnostic
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
| 220 | + | |
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
277 | 281 | | |
278 | 282 | | |
279 | 283 | | |
| |||
405 | 409 | | |
406 | 410 | | |
407 | 411 | | |
408 | | - | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
409 | 415 | | |
410 | 416 | | |
| 417 | + | |
411 | 418 | | |
412 | 419 | | |
413 | 420 | | |
414 | | - | |
| 421 | + | |
415 | 422 | | |
416 | 423 | | |
417 | 424 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
| 191 | + | |
| 192 | + | |
191 | 193 | | |
192 | 194 | | |
| 195 | + | |
193 | 196 | | |
194 | 197 | | |
195 | 198 | | |
196 | | - | |
| 199 | + | |
197 | 200 | | |
198 | 201 | | |
199 | 202 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
Lines changed: 1 addition & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | 35 | | |
42 | 36 | | |
43 | 37 | | |
| |||
161 | 155 | | |
162 | 156 | | |
163 | 157 | | |
164 | | - | |
165 | | - | |
166 | 158 | | |
167 | 159 | | |
168 | 160 | | |
169 | 161 | | |
170 | | - | |
171 | | - | |
| 162 | + | |
172 | 163 | | |
173 | 164 | | |
174 | 165 | | |
| |||
185 | 176 | | |
186 | 177 | | |
187 | 178 | | |
188 | | - | |
189 | | - | |
190 | 179 | | |
191 | 180 | | |
192 | 181 | | |
| |||
266 | 255 | | |
267 | 256 | | |
268 | 257 | | |
269 | | - | |
270 | | - | |
271 | 258 | | |
272 | 259 | | |
273 | 260 | | |
274 | 261 | | |
275 | 262 | | |
276 | 263 | | |
277 | 264 | | |
278 | | - | |
279 | 265 | | |
280 | 266 | | |
281 | 267 | | |
| |||
288 | 274 | | |
289 | 275 | | |
290 | 276 | | |
291 | | - | |
292 | 277 | | |
293 | 278 | | |
294 | 279 | | |
| |||
Lines changed: 0 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
379 | 379 | | |
380 | 380 | | |
381 | 381 | | |
382 | | - | |
383 | 382 | | |
384 | | - | |
385 | 383 | | |
386 | 384 | | |
387 | 385 | | |
388 | 386 | | |
389 | | - | |
390 | 387 | | |
391 | | - | |
392 | 388 | | |
393 | 389 | | |
394 | 390 | | |
| |||
417 | 413 | | |
418 | 414 | | |
419 | 415 | | |
420 | | - | |
421 | 416 | | |
422 | | - | |
423 | 417 | | |
424 | 418 | | |
425 | 419 | | |
| |||
428 | 422 | | |
429 | 423 | | |
430 | 424 | | |
431 | | - | |
432 | 425 | | |
433 | | - | |
434 | 426 | | |
435 | 427 | | |
436 | 428 | | |
| |||
Lines changed: 15 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
23 | 34 | | |
24 | 35 | | |
25 | 36 | | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
| 37 | + | |
| 38 | + | |
30 | 39 | | |
31 | 40 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
| 41 | + | |
| 42 | + | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
41 | 46 | | |
42 | 47 | | |
43 | 48 | | |
44 | 49 | | |
45 | | - | |
46 | 50 | | |
47 | | - | |
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
51 | 54 | | |
52 | | - | |
53 | 55 | | |
54 | | - | |
55 | 56 | | |
0 commit comments