Fix air-to-std crash for DMA ops with >4 offset/size dimensions

erwei-xilinx · claude · erwei-xilinx · commit cb224604dfc7 · 2026-04-08T21:38:13.000-07:00
AIRDmaMemcpyNdToAIRRtConversion assumed DMA offsets and sizes have at
most 4 elements (matching memref rank). However, the BD optimization
pass and block layout lowering can produce DMA ops with 6+ dimensions
in offsets/sizes (e.g., 6D block layout for matmul). This caused a
SmallVector out-of-bounds access when converting to the 4D airrt
format.

Apply the same drop_front logic (already used for strides) to offsets
and sizes: when N &gt; 4, take the last 4 elements. This matches the
hardware's 4D BD dimension limit.

Fixes crashes at large problem sizes (&gt;1k) for both reference matmul
(run.py without --direct-codegen) and fused SwiGLU designs.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/mlir/lib/Conversion/AIRLoweringPass.cpp b/mlir/lib/Conversion/AIRLoweringPass.cpp
@@ -567,14 +567,18 @@ class AIRDmaMemcpyNdToAIRRtConversion
     SmallVector<Value, 4> lengths(4, one);
     SmallVector<Value, 4> strides(4, zero);
 
-    int idx = 4 - src.getRank();
-    for (auto o : isFromTile ? op.getDstOffsets() : op.getSrcOffsets())
+    // Take last min(4, N) elements for offsets, sizes, and strides.
+    // When N > 4, drop leading elements to fit the 4D airrt format.
+    auto op_offsets = isFromTile ? op.getDstOffsets() : op.getSrcOffsets();
+    auto offsets_to_use = op_offsets;
+    if (offsets_to_use.size() > 4)
+      offsets_to_use = offsets_to_use.drop_front(offsets_to_use.size() - 4);
+    int idx = 4 - offsets_to_use.size();
+    for (auto o : offsets_to_use)
       offsets[idx++] = arith::IndexCastOp::create(rewriter, op->getLoc(),
                                                   IntegerType::get(ctx, 64), o);
     auto op_strides = isFromTile ? op.getDstStrides() : op.getSrcStrides();
     if (op_strides.size()) {
-      // Take last min(4, N) strides, drop leading strides if N > 4.
-      // The innermost stride (last element) is now preserved.
       auto strides_to_use = op_strides;
       if (strides_to_use.size() > 4)
         strides_to_use = strides_to_use.drop_front(strides_to_use.size() - 4);
@@ -583,8 +587,12 @@ class AIRDmaMemcpyNdToAIRRtConversion
         strides[idx++] = arith::IndexCastOp::create(
             rewriter, op->getLoc(), IntegerType::get(ctx, 64), o);
     }
-    idx = 4 - src.getRank();
-    for (auto o : isFromTile ? op.getDstSizes() : op.getSrcSizes())
+    auto op_sizes = isFromTile ? op.getDstSizes() : op.getSrcSizes();
+    auto sizes_to_use = op_sizes;
+    if (sizes_to_use.size() > 4)
+      sizes_to_use = sizes_to_use.drop_front(sizes_to_use.size() - 4);
+    idx = 4 - sizes_to_use.size();
+    for (auto o : sizes_to_use)
       lengths[idx++] = arith::IndexCastOp::create(rewriter, op->getLoc(),
                                                   IntegerType::get(ctx, 64), o);