Skip to content

Commit cb22460

Browse files
erwei-xilinxclaude
andcommitted
Fix air-to-std crash for DMA ops with >4 offset/size dimensions
AIRDmaMemcpyNdToAIRRtConversion assumed DMA offsets and sizes have at most 4 elements (matching memref rank). However, the BD optimization pass and block layout lowering can produce DMA ops with 6+ dimensions in offsets/sizes (e.g., 6D block layout for matmul). This caused a SmallVector out-of-bounds access when converting to the 4D airrt format. Apply the same drop_front logic (already used for strides) to offsets and sizes: when N > 4, take the last 4 elements. This matches the hardware's 4D BD dimension limit. Fixes crashes at large problem sizes (>1k) for both reference matmul (run.py without --direct-codegen) and fused SwiGLU designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9cef756 commit cb22460

1 file changed

Lines changed: 14 additions & 6 deletions

File tree

mlir/lib/Conversion/AIRLoweringPass.cpp

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -567,14 +567,18 @@ class AIRDmaMemcpyNdToAIRRtConversion
567567
SmallVector<Value, 4> lengths(4, one);
568568
SmallVector<Value, 4> strides(4, zero);
569569

570-
int idx = 4 - src.getRank();
571-
for (auto o : isFromTile ? op.getDstOffsets() : op.getSrcOffsets())
570+
// Take last min(4, N) elements for offsets, sizes, and strides.
571+
// When N > 4, drop leading elements to fit the 4D airrt format.
572+
auto op_offsets = isFromTile ? op.getDstOffsets() : op.getSrcOffsets();
573+
auto offsets_to_use = op_offsets;
574+
if (offsets_to_use.size() > 4)
575+
offsets_to_use = offsets_to_use.drop_front(offsets_to_use.size() - 4);
576+
int idx = 4 - offsets_to_use.size();
577+
for (auto o : offsets_to_use)
572578
offsets[idx++] = arith::IndexCastOp::create(rewriter, op->getLoc(),
573579
IntegerType::get(ctx, 64), o);
574580
auto op_strides = isFromTile ? op.getDstStrides() : op.getSrcStrides();
575581
if (op_strides.size()) {
576-
// Take last min(4, N) strides, drop leading strides if N > 4.
577-
// The innermost stride (last element) is now preserved.
578582
auto strides_to_use = op_strides;
579583
if (strides_to_use.size() > 4)
580584
strides_to_use = strides_to_use.drop_front(strides_to_use.size() - 4);
@@ -583,8 +587,12 @@ class AIRDmaMemcpyNdToAIRRtConversion
583587
strides[idx++] = arith::IndexCastOp::create(
584588
rewriter, op->getLoc(), IntegerType::get(ctx, 64), o);
585589
}
586-
idx = 4 - src.getRank();
587-
for (auto o : isFromTile ? op.getDstSizes() : op.getSrcSizes())
590+
auto op_sizes = isFromTile ? op.getDstSizes() : op.getSrcSizes();
591+
auto sizes_to_use = op_sizes;
592+
if (sizes_to_use.size() > 4)
593+
sizes_to_use = sizes_to_use.drop_front(sizes_to_use.size() - 4);
594+
idx = 4 - sizes_to_use.size();
595+
for (auto o : sizes_to_use)
588596
lengths[idx++] = arith::IndexCastOp::create(rewriter, op->getLoc(),
589597
IntegerType::get(ctx, 64), o);
590598

0 commit comments

Comments
 (0)