-
Notifications
You must be signed in to change notification settings - Fork 81
[LoadStoreOpToLLVM] Support rank > 2 tensor load with block io #5759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the LoadStoreOpToLLVM conversion to support tensor loads with rank > 2 using block I/O operations. The implementation refactors the existing block pointer handling to work with higher-dimensional tensors by properly extracting and using offsets, shapes, and strides from block pointers.
Key changes:
- Removed the specialized
rewriteTensorPointerLoadmethod and integrated its functionality into the main conversion path - Added helper methods (
getBases,getShapes,getStrides,getOffsets) to extract block pointer components generically for any rank - Switched from
TritonGEN::Matrix2DBlockLoadOpto GenISA intrinsics for more flexible block I/O operations
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| LoadStoreOpToLLVM.cpp | Refactored load conversion to support rank > 2 tensors with block I/O by extracting helper methods and using GenISA intrinsics |
| GenIntrinsicHelper.h | Added helper classes and templates for GenISA intrinsic declarations |
| GenIntrinsicHelper.cpp | Implemented GenISA intrinsic declaration and attribute handling |
| GenIntrinsics.h | Declared GenISA intrinsic interface functions |
| GenIntrinsicEnum.h | Defined GenISA intrinsic ID enumeration |
| CMakeLists.txt | Added GenISAIntrinsics library dependency |
| subgroup-2d-block-io.mlir | Updated test expectations for new load patterns |
| blockptr_load.mlir | Updated test expectations for block pointer loads |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
|
|
||
| // Returns the offsets of the block from regular pointer or block pointer.. |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double period at the end of the comment. Should be a single period.
| // Returns the offsets of the block from regular pointer or block pointer.. | |
| // Returns the offsets of the block from regular pointer or block pointer. |
| return SmallVector<Value>(unpackedPtrs.begin() + blockOffset, | ||
| unpackedPtrs.begin() + blockShape); | ||
| } else { | ||
| // For the regular pointers, the offsets has already been added into |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammatical error: 'offsets has' should be 'offsets have' (subject-verb agreement).
| // For the regular pointers, the offsets has already been added into | |
| // For the regular pointers, the offsets have already been added into |
| Value off = b.mul(offsetY, pitch); | ||
| addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off); |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable off on line 1650 shadows the loop variable off from line 1643. The inner variable should be renamed to avoid shadowing, such as offsetBytes or rowOffset.
| Value off = b.mul(offsetY, pitch); | |
| addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off); | |
| Value offsetBytes = b.mul(offsetY, pitch); | |
| addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, offsetBytes); |
| // enum LSC2DBlockField { BASE = 1, WIDTH = 2, HEIGHT = 3, PITCH = 4, | ||
| // BLOCKX = 5, BLOCKY = 6 }; |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum definition is commented out but appears to document magic numbers used in the subsequent calls to setBaseAddr and setWidthHeightAndOffsets. Consider either uncommenting this as an actual enum or adding a reference to where this enum is defined in the codebase.
47e006f to
4c5a931
Compare
4c5a931 to
fe39531
Compare
…lar pointer and block pointer. And clean up duplicate code. Signed-off-by: Lu,Chengjun <[email protected]>
This PR enhances the LoadStoreOpToLLVM conversion to support tensor loads with rank > 2 using block I/O operations. The implementation refactors the existing block pointer handling to work with higher-dimensional tensors by properly extracting and using offsets, shapes, and strides from block pointers.