Skip to content

Conversation

@chengjunlu
Copy link
Contributor

@chengjunlu chengjunlu commented Dec 26, 2025

This PR enhances the LoadStoreOpToLLVM conversion to support tensor loads with rank > 2 using block I/O operations. The implementation refactors the existing block pointer handling to work with higher-dimensional tensors by properly extracting and using offsets, shapes, and strides from block pointers.

@chengjunlu chengjunlu requested a review from Copilot December 26, 2025 03:16
@chengjunlu chengjunlu marked this pull request as draft December 26, 2025 03:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the LoadStoreOpToLLVM conversion to support tensor loads with rank > 2 using block I/O operations. The implementation refactors the existing block pointer handling to work with higher-dimensional tensors by properly extracting and using offsets, shapes, and strides from block pointers.

Key changes:

  • Removed the specialized rewriteTensorPointerLoad method and integrated its functionality into the main conversion path
  • Added helper methods (getBases, getShapes, getStrides, getOffsets) to extract block pointer components generically for any rank
  • Switched from TritonGEN::Matrix2DBlockLoadOp to GenISA intrinsics for more flexible block I/O operations

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
LoadStoreOpToLLVM.cpp Refactored load conversion to support rank > 2 tensors with block I/O by extracting helper methods and using GenISA intrinsics
GenIntrinsicHelper.h Added helper classes and templates for GenISA intrinsic declarations
GenIntrinsicHelper.cpp Implemented GenISA intrinsic declaration and attribute handling
GenIntrinsics.h Declared GenISA intrinsic interface functions
GenIntrinsicEnum.h Defined GenISA intrinsic ID enumeration
CMakeLists.txt Added GenISAIntrinsics library dependency
subgroup-2d-block-io.mlir Updated test expectations for new load patterns
blockptr_load.mlir Updated test expectations for block pointer loads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
}

// Returns the offsets of the block from regular pointer or block pointer..
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double period at the end of the comment. Should be a single period.

Suggested change
// Returns the offsets of the block from regular pointer or block pointer..
// Returns the offsets of the block from regular pointer or block pointer.

Copilot uses AI. Check for mistakes.
return SmallVector<Value>(unpackedPtrs.begin() + blockOffset,
unpackedPtrs.begin() + blockShape);
} else {
// For the regular pointers, the offsets has already been added into
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical error: 'offsets has' should be 'offsets have' (subject-verb agreement).

Suggested change
// For the regular pointers, the offsets has already been added into
// For the regular pointers, the offsets have already been added into

Copilot uses AI. Check for mistakes.
Comment on lines +1650 to +1627
Value off = b.mul(offsetY, pitch);
addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off);
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable off on line 1650 shadows the loop variable off from line 1643. The inner variable should be renamed to avoid shadowing, such as offsetBytes or rowOffset.

Suggested change
Value off = b.mul(offsetY, pitch);
addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, off);
Value offsetBytes = b.mul(offsetY, pitch);
addrElem = b.gep(ptr_ty(ctx, 1), i8_ty, addrElem, offsetBytes);

Copilot uses AI. Check for mistakes.
Comment on lines 1708 to 1709
// enum LSC2DBlockField { BASE = 1, WIDTH = 2, HEIGHT = 3, PITCH = 4,
// BLOCKX = 5, BLOCKY = 6 };
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum definition is commented out but appears to document magic numbers used in the subsequent calls to setBaseAddr and setWidthHeightAndOffsets. Consider either uncommenting this as an actual enum or adding a reference to where this enum is defined in the codebase.

Copilot uses AI. Check for mistakes.
@chengjunlu chengjunlu force-pushed the chengjun/support_3d_tensor_block_io branch from 47e006f to 4c5a931 Compare December 26, 2025 03:17
@chengjunlu chengjunlu linked an issue Dec 26, 2025 that may be closed by this pull request
@chengjunlu chengjunlu force-pushed the chengjun/support_3d_tensor_block_io branch from 4c5a931 to fe39531 Compare December 26, 2025 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Helion] Support rank > 2 tensor descriptor load with block IO.

2 participants