Fix air-to-std crash for DMA ops with >4 offset/size dimensions#1517
Draft
erwei-xilinx wants to merge 3 commits into
Draft
Fix air-to-std crash for DMA ops with >4 offset/size dimensions#1517erwei-xilinx wants to merge 3 commits into
erwei-xilinx wants to merge 3 commits into
Conversation
AIRDmaMemcpyNdToAIRRtConversion assumed DMA offsets and sizes have at most 4 elements (matching memref rank). However, the BD optimization pass and block layout lowering can produce DMA ops with 6+ dimensions in offsets/sizes (e.g., 6D block layout for matmul). This caused a SmallVector out-of-bounds access when converting to the 4D airrt format. Apply the same drop_front logic (already used for strides) to offsets and sizes: when N > 4, take the last 4 elements. This matches the hardware's 4D BD dimension limit. Fixes crashes at large problem sizes (>1k) for both reference matmul (run.py without --direct-codegen) and fused SwiGLU designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a crash in the -air-to-std lowering when converting air.dma_memcpy_nd ops whose offsets/sizes have more than 4 dimensions (e.g., 6D patterns produced by block layout lowering / BD optimization), by truncating offsets and sizes to the last 4 elements to match the 4D AIRRt DMA format.
Changes:
- Drop leading offset elements when
offsets.size() > 4, mirroring existing stride behavior. - Drop leading size elements when
sizes.size() > 4, preventing negative indexing /SmallVector<Value, 4>OOB writes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Replace duplicated if/drop_front truncation with a shared truncateToLast4 lambda using take_back(4) - Add assertion verifying dropped leading offsets are zero - Add comment documenting the safety invariant - Add LIT test (air_dma_nd_6d_to_airrt.mlir) exercising the 6D→4D path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract the hardcoded airrt 4-dimension limit into a named constant kAIRRtMaxNDims, used in both AIRDmaMemcpyNdToAIRRtConversion and AIRChannelInterfaceToAIRRtConversionImpl. This makes the relationship to the airrt.dma_memcpy_nd / airrt.memcpy_nd TableGen definitions explicit and easier to update if future architectures change the limit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cb22460 to
767dd14
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AIRDmaMemcpyNdToAIRRtConversionSmallVector out-of-bounds crash when DMA ops have >4 offset/size dimensionsdrop_frontlogic (already used for strides) to offsets and sizes to fit the 4D airrt formatRoot cause
The conversion pattern assumed DMA offsets/sizes have at most 4 elements (matching memref rank). However, block layout lowering produces 6D patterns (e.g.,
[herd_m, herd_n, n_blks, m_blks, mmul_m, mmul_k]), and the BD optimization pass can create >4D offsets. Whensrc.getRank() > 4,idx = 4 - rankbecomes negative, causingoffsets[idx++]to write past the end of aSmallVector<Value, 4>.Impact
Fixes crashes at large problem sizes for:
run.pywithout--direct-codegen) at 2048×2048×2048 and largerPerformance after fix
Test plan
check-air-mlirtests pass🤖 Generated with Claude Code