Skip to content

Commit 6684293

Browse files
authored
Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check (#10066)
The subslice safety check in lowerLdStMatrix uses a bitmask to verify that affine offsets don't touch the contiguous part of the tile's offset dimension. It was using getOutDimSizeLog2 (which returns log2 of the size) instead of getOutDimSize (the actual size) to construct this mask. For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 = 7 (0b111) correctly checks all bits within the tile span. The bug makes the check too permissive — it could allow subslices that overlap the contiguous tile region. Latent because the specific bit patterns in maskSpanAffineOffset rarely trigger the difference. Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends. # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - [x] This PR does not need a test because existing tests pass, and the fix only makes the safety check stricter . - [x] I have not added any `lit` tests.
1 parent cdac714 commit 6684293

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

third_party/amd/lib/TritonAMDGPUToLLVM/MemoryOpToLLVM.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ class TransLocalLoadOpConversion
187187

188188
// If we are lowering a subslice, the subslice offsets shall not touch the
189189
// contiguous part of the tile
190-
if (maskSpanAffineOffset & (tile.getOutDimSizeLog2(kOffset) - 1)) {
190+
if (maskSpanAffineOffset & (tile.getOutDimSize(kOffset) - 1)) {
191191
return failure();
192192
}
193193

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/Utility.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -332,7 +332,7 @@ LogicalResult lowerLdStMatrix(
332332

333333
// If we are lowering a subslice, the subslice offsets shall not touch the
334334
// contiguous part of the tile
335-
if (maskSpanAffineOffset & (tile.getOutDimSizeLog2(kOffset) - 1)) {
335+
if (maskSpanAffineOffset & (tile.getOutDimSize(kOffset) - 1)) {
336336
return failure();
337337
}
338338

0 commit comments

Comments
 (0)