Skip to content

[HW] HWVectorization Part 3: Structural Patterns#9749

Open
mafeguimaraes wants to merge 3 commits intollvm:mainfrom
mafeguimaraes:hw-vec-part3
Open

[HW] HWVectorization Part 3: Structural Patterns#9749
mafeguimaraes wants to merge 3 commits intollvm:mainfrom
mafeguimaraes:hw-vec-part3

Conversation

@mafeguimaraes
Copy link
Contributor

Context: This PR is the third part of a series of incremental patches for the HWVectorization pass. Building upon the bit-tracking infrastructure from Parts 1 and 2 (#9704 and #9739), this patch introduces Structural Vectorization: the ability to collapse N isomorphic scalar logic cones into a single wide operation.

Key Enhancements:

  • Structural Isomorphism Check: The pass analyzes the scalar subgraph feeding each output bit and verifies that all N bit-slices are structurally equivalent up to a uniform bit-index offset. This enables collapsing patterns like N independent AND/OR/XOR/MUX gates into a single N-bit operation.
// Before: 4 independent XOR gates
%xor0 = comb.xor %a0, %b0 : i1
%xor1 = comb.xor %a1, %b1 : i1
%xor2 = comb.xor %a2, %b2 : i1
%xor3 = comb.xor %a3, %b3 : i1
%out  = comb.concat %xor3, %xor2, %xor1, %xor0 : i1, i1, i1, i1

// After: single 4-bit XOR
%out = comb.xor %a, %b : i4
  • Shared Control Signal Handling: Scalar signals shared across all bit lanes (e.g., a MUX select or an AND enable) are identified by areSubgraphsEquivalent as common leaves and passed directly to the wide operation, or broadcast via comb.replicate when used as data operands.
// Before: 4 muxes controlled by the same scalar %sel
%m0 = comb.mux %sel, %a0, %b0 : i1
%m1 = comb.mux %sel, %a1, %b1 : i1
%m2 = comb.mux %sel, %a2, %b2 : i1
%m3 = comb.mux %sel, %a3, %b3 : i1
%out = comb.concat %m3, %m2, %m1, %m0 : i1, i1, i1, i1

// After: single wide mux with shared scalar selector
%out = comb.mux %sel, %a, %b : i4
  • Recursive Subgraph Reconstruction: vectorizeSubgraph recursively rebuilds the scalar logic tree into its vectorized equivalent, supporting arbitrary depth (e.g., (a[i] & b[i]) ^ c[i] across all bits becomes comb.and followed by comb.xor).
// Before: 2-level cone, (a[i] & b[i]) ^ c[i] for each bit
%and0 = comb.and %a0, %b0 : i1
%and1 = comb.and %a1, %b1 : i1
%xor0 = comb.xor %and0, %c0 : i1
%xor1 = comb.xor %and1, %c1 : i1
%out  = comb.concat %xor1, %xor0 : i1, i1

// After: two wide ops preserving the original structure
%and = comb.and %a, %b : i2
%out = comb.xor %and, %c : i2
  • Multiple Output Support: Each output port is analyzed and transformed independently, so modules with multiple vectorizable outputs (e.g., out_xor and out_and) are fully vectorized in a single pass.
// Before: two independent scalar cones feeding two outputs
%out_xor = comb.concat %xor3, %xor2, %xor1, %xor0 : i1, i1, i1, i1
%out_and = comb.concat %and3, %and2, %and1, %and0 : i1, i1, i1, i1

// After: each output vectorized independently
%out_xor = comb.xor %a, %b : i4
%out_and = comb.and %a, %c : i4

@mafeguimaraes mafeguimaraes marked this pull request as draft March 3, 2026 12:50
@mafeguimaraes mafeguimaraes marked this pull request as ready for review March 3, 2026 19:36
@mafeguimaraes
Copy link
Contributor Author

Hi @uenoku, I’ve just finished the implementation for Part 3 (Structural Patterns). When you have a moment, could you please take a look?

Thanks for the help throughout this process!

if (!visited.insert(val).second)
return true;

if (auto *op = val.getDefiningOp()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always non-null because isa<BlockArgument>(val)...

Suggested change
if (auto *op = val.getDefiningOp()) {
auto *op = val.getDefiningOp();

/// Determines if a shared value is safe for vectorization. Safe values
/// include constants and block arguments, which act as shared control
/// signals.
bool isSafeSharedValue(mlir::Value val,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if you could explain when this return false? As far as I see I think it always returns true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! The function always returned true because every value eventually traces back to a BlockArgument or ConstantOp. The recursive traversal was unnecessary, only constants and block arguments are safe to share between bit lanes, so I simplified it. Fixed in the latest commit.

Comment on lines +284 to +289
if (!slice0Val || !slice0Val.getDefiningOp())
return false;

Value slice1Val = findBitSource(output, 1);
if (!slice1Val || !slice1Val.getDefiningOp())
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!slice0Val || !slice0Val.getDefiningOp())
return false;
Value slice1Val = findBitSource(output, 1);
if (!slice1Val || !slice1Val.getDefiningOp())
return false;
if (!slice0Val)
return false;
Value slice1Val = findBitSource(output, 1);
if (!slice1Val)
return false;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants