Fix air-to-std crash for DMA ops with >4 offset/size dimensions by erwei-xilinx · Pull Request #1517 · Xilinx/mlir-air

erwei-xilinx · 2026-04-08T23:45:15Z

Summary

Fix AIRDmaMemcpyNdToAIRRtConversion SmallVector out-of-bounds crash when DMA ops have >4 offset/size dimensions
Apply drop_front logic (already used for strides) to offsets and sizes to fit the 4D airrt format

Root cause

The conversion pattern assumed DMA offsets/sizes have at most 4 elements (matching memref rank). However, block layout lowering produces 6D patterns (e.g., [herd_m, herd_n, n_blks, m_blks, mmul_m, mmul_k]), and the BD optimization pass can create >4D offsets. When src.getRank() > 4, idx = 4 - rank becomes negative, causing offsets[idx++] to write past the end of a SmallVector<Value, 4>.

Impact

Fixes crashes at large problem sizes for:

Reference matmul (run.py without --direct-codegen) at 2048×2048×2048 and larger
Fused SwiGLU at 2048×2048×8192

Performance after fix

Workload	Size	Result
Reference matmul	2048×2048×2048	5,048 GFLOPS (was crashing)
Reference matmul	2048×2048×8192	4,599 GFLOPS (was crashing)
Fused SwiGLU	512×512×512	PASS (correctness)
Fused SwiGLU	2048×2048×8192	1,813 GFLOPS (was crashing)

Test plan

All 365 check-air-mlir tests pass
Reference matmul correctness and profiling at 2048×2048×2048, 2048×2048×8192
Fused SwiGLU correctness at 512×512×512

🤖 Generated with Claude Code

AIRDmaMemcpyNdToAIRRtConversion assumed DMA offsets and sizes have at most 4 elements (matching memref rank). However, the BD optimization pass and block layout lowering can produce DMA ops with 6+ dimensions in offsets/sizes (e.g., 6D block layout for matmul). This caused a SmallVector out-of-bounds access when converting to the 4D airrt format. Apply the same drop_front logic (already used for strides) to offsets and sizes: when N > 4, take the last 4 elements. This matches the hardware's 4D BD dimension limit. Fixes crashes at large problem sizes (>1k) for both reference matmul (run.py without --direct-codegen) and fused SwiGLU designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Fixes a crash in the -air-to-std lowering when converting air.dma_memcpy_nd ops whose offsets/sizes have more than 4 dimensions (e.g., 6D patterns produced by block layout lowering / BD optimization), by truncating offsets and sizes to the last 4 elements to match the 4D AIRRt DMA format.

Changes:

Drop leading offset elements when offsets.size() > 4, mirroring existing stride behavior.
Drop leading size elements when sizes.size() > 4, preventing negative indexing / SmallVector<Value, 4> OOB writes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Replace duplicated if/drop_front truncation with a shared truncateToLast4 lambda using take_back(4) - Add assertion verifying dropped leading offsets are zero - Add comment documenting the safety invariant - Add LIT test (air_dma_nd_6d_to_airrt.mlir) exercising the 6D→4D path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract the hardcoded airrt 4-dimension limit into a named constant kAIRRtMaxNDims, used in both AIRDmaMemcpyNdToAIRRtConversion and AIRChannelInterfaceToAIRRtConversionImpl. This makes the relationship to the airrt.dma_memcpy_nd / airrt.memcpy_nd TableGen definitions explicit and easier to update if future architectures change the limit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

erwei-xilinx requested a review from fifield as a code owner April 8, 2026 23:45

Copilot AI review requested due to automatic review settings April 8, 2026 23:45

Copilot started reviewing on behalf of erwei-xilinx April 8, 2026 23:47 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread mlir/lib/Conversion/AIRLoweringPass.cpp Outdated

erwei-xilinx and others added 2 commits April 8, 2026 17:13

erwei-xilinx force-pushed the erwei/fix-air-to-std-oob branch 2 times, most recently from cb22460 to 767dd14 Compare April 9, 2026 04:49

erwei-xilinx marked this pull request as draft April 9, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix air-to-std crash for DMA ops with >4 offset/size dimensions#1517

Fix air-to-std crash for DMA ops with >4 offset/size dimensions#1517
erwei-xilinx wants to merge 3 commits into
Xilinx:mainfrom
erwei-xilinx:erwei/fix-air-to-std-oob

erwei-xilinx commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erwei-xilinx commented Apr 8, 2026

Summary

Root cause

Impact

Performance after fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants