[LoadStoreOpToLLVM] Support rank > 2 tensor load with block io #5759

chengjunlu · 2025-12-26T03:16:20Z

This PR enhances the LoadStoreOpToLLVM conversion to support tensor loads with rank > 2 using block I/O operations. The implementation refactors the existing block pointer handling to work with higher-dimensional tensors by properly extracting and using offsets, shapes, and strides from block pointers.

Copilot

Pull request overview

This PR enhances the LoadStoreOpToLLVM conversion to support tensor loads with rank > 2 using block I/O operations. The implementation refactors the existing block pointer handling to work with higher-dimensional tensors by properly extracting and using offsets, shapes, and strides from block pointers.

Key changes:

Removed the specialized rewriteTensorPointerLoad method and integrated its functionality into the main conversion path
Added helper methods (getBases, getShapes, getStrides, getOffsets) to extract block pointer components generically for any rank
Switched from TritonGEN::Matrix2DBlockLoadOp to GenISA intrinsics for more flexible block I/O operations

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
LoadStoreOpToLLVM.cpp	Refactored load conversion to support rank > 2 tensors with block I/O by extracting helper methods and using GenISA intrinsics
GenIntrinsicHelper.h	Added helper classes and templates for GenISA intrinsic declarations
GenIntrinsicHelper.cpp	Implemented GenISA intrinsic declaration and attribute handling
GenIntrinsics.h	Declared GenISA intrinsic interface functions
GenIntrinsicEnum.h	Defined GenISA intrinsic ID enumeration
CMakeLists.txt	Added GenISAIntrinsics library dependency
subgroup-2d-block-io.mlir	Updated test expectations for new load patterns
blockptr_load.mlir	Updated test expectations for block pointer loads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T03:17:30Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+    }
+  }
+
+  // Returns the offsets of the block from regular pointer or block pointer..


Double period at the end of the comment. Should be a single period.

Suggested change

// Returns the offsets of the block from regular pointer or block pointer..

// Returns the offsets of the block from regular pointer or block pointer.

Copilot · 2025-12-26T03:17:30Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+      return SmallVector<Value>(unpackedPtrs.begin() + blockOffset,
+                                unpackedPtrs.begin() + blockShape);
+    } else {
+      // For the regular pointers, the offsets has already been added into


Grammatical error: 'offsets has' should be 'offsets have' (subject-verb agreement).

Suggested change

// For the regular pointers, the offsets has already been added into

// For the regular pointers, the offsets have already been added into

Copilot · 2025-12-26T03:17:30Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+              Value off = b.mul(offsetY, pitch);
+              addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off);


Variable off on line 1650 shadows the loop variable off from line 1643. The inner variable should be renamed to avoid shadowing, such as offsetBytes or rowOffset.

Suggested change

Value off = b.mul(offsetY, pitch);

addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off);

Value offsetBytes = b.mul(offsetY, pitch);

addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, offsetBytes);

Copilot · 2025-12-26T03:17:30Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+      // enum LSC2DBlockField { BASE = 1, WIDTH = 2, HEIGHT = 3, PITCH = 4,
+      // BLOCKX = 5, BLOCKY = 6 };


This enum definition is commented out but appears to document magic numbers used in the subsequent calls to setBaseAddr and setWidthHeightAndOffsets. Consider either uncommenting this as an actual enum or adding a reference to where this enum is defined in the codebase.

…lar pointer and block pointer. And clean up duplicate code. Signed-off-by: Lu,Chengjun <[email protected]>

chengjunlu requested a review from Copilot December 26, 2025 03:16

chengjunlu marked this pull request as draft December 26, 2025 03:16

Copilot AI reviewed Dec 26, 2025

View reviewed changes

chengjunlu force-pushed the chengjun/support_3d_tensor_block_io branch from 47e006f to 4c5a931 Compare December 26, 2025 03:17

chengjunlu linked an issue Dec 26, 2025 that may be closed by this pull request

[Helion] Support rank > 2 tensor descriptor load with block IO. #5365

Open

chengjunlu force-pushed the chengjun/support_3d_tensor_block_io branch from 4c5a931 to fe39531 Compare December 26, 2025 05:02

chengjunlu added 2 commits December 26, 2025 11:04

[LoadStoreOpToLLVM] Unify the 2D block IO lowering code for both regu…

21ea59c

…lar pointer and block pointer. And clean up duplicate code. Signed-off-by: Lu,Chengjun <[email protected]>

[LoadStoreOpToLLVM] Support use 2D block IO to load rank > 2 tensor.

fe39531

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoadStoreOpToLLVM] Support rank > 2 tensor load with block io #5759

[LoadStoreOpToLLVM] Support rank > 2 tensor load with block io #5759

chengjunlu commented Dec 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// Returns the offsets of the block from regular pointer or block pointer..
	// Returns the offsets of the block from regular pointer or block pointer.

	// For the regular pointers, the offsets has already been added into
	// For the regular pointers, the offsets have already been added into

		Value off = b.mul(offsetY, pitch);
		addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off);

		// enum LSC2DBlockField { BASE = 1, WIDTH = 2, HEIGHT = 3, PITCH = 4,
		// BLOCKX = 5, BLOCKY = 6 };

[LoadStoreOpToLLVM] Support rank > 2 tensor load with block io #5759

Are you sure you want to change the base?

[LoadStoreOpToLLVM] Support rank > 2 tensor load with block io #5759

Conversation

chengjunlu commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengjunlu commented Dec 26, 2025 •

edited

Loading