[VectorDistribute] Add VectorTileSizeAnalysis by sommerlukas · Pull Request #23668 · iree-org/iree

sommerlukas · 2026-03-05T15:39:19Z

Motivation for this change is to better support masking to make up for differences between actual tensor shape and tile sizes chosen to match the HW. For example, lowering config selection would be able to choose a tile size of 64, even if the input has size 127. To ensure correct computation, proper masking needs to be introduced in this case during GenericVectorization. To that end, GenericVectorization needs reliable vector tile size information.

The goal of the VectorTileSizeAnalysis is to provide such information. The analysis is seeded from to_layout operations, which in turn have been inserted based on lowering_config. The VectorTileSizeAnalysis propagates the information through the graph. It uses the upstream MLIR dataflow analysis framework and combines a sparse forward and sparse backward analysis in a single solver with a shared lattice.

Conflicts are resolved by an overdefined/"top" state. Consumers such as GenericVectorization can then choose a candidate, e.g. based on heuristics.

To make the vector tile size readily available in GenericVectorization, the MaterializeVectorTileSizePass materializes the analysis results as discardable attribute on compute ops.

This is part of #23415.

Assisted-by: Claude Code

hanhanW

Thanks for the patch! I can see the value of this pass, which is an improved version of useConfiguredSizes in generic vectorization. I'm thinking to extend the pass (in a follow-up) which moves the useConfiguredSizes logic into the anslysis, which can work well with the getVectorSize lowering config interface method.

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

sommerlukas · 2026-03-17T10:25:32Z

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:

%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...

Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

Groverkss

LGTM modulo hanhan's comments. I think it's okay to just bail out on duplicate sizes for now and we can improve if needed.

sommerlukas · 2026-03-18T15:22:24Z

I've updated the analysis to only track one size per dimension. In case of conflict, we now go to a top/overdefined state. Propagation from duplicatable operations is now completely disabled, which avoids conflict in the scenario I outlined above.

The tracking of one size per dimension also facilitates adding the scalable flag to the analysis for CPU later on.

hanhanW

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:
%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...
Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

I still think that it is a bug if it happens. It means that the config generator has a bug. The example you showed is valid, and you have a good fix for it. We should stop propagation if it is tensor.empty() op. It is describing the shape of the result, but it is not a connection between two linalg.generic ops.

Anyway, the current revision mostly looks good. I left few comments, please take a look. Thanks!

hanhanW · 2026-03-18T22:22:24Z

+      if (!perDimSizes) {
+        return;
+      }


I think we should either emit a message or signal a failure when it happens.

I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

I think we should emit errors or at least warnings if there are overdefined tile sizes. It means that there is a bug in the analysis or input program.

It is okay if there are uninitialized tile sizes, because some dimension may not be tiled due to odd reasons.

The point is over-defined is a bug to me.

sommerlukas

Thanks for the feedback @hanhanW! I've addressed the comments and put some more replies inline.

sommerlukas · 2026-03-19T09:06:00Z

+  // A linalg op that doesn't read any tensor data (e.g., linalg.fill or a
+  // fill-like linalg.generic broadcasting a scalar) is a generator and
+  // duplicatable.
+  if (auto linalgOp = dyn_cast<linalg::LinalgOp>(defOp)) {
+    if (llvm::none_of(linalgOp->getOpOperands(), [&](OpOperand &operand) {
+          return isa<ShapedType>(operand.get().getType()) &&
+                 linalgOp.payloadUsesValueFromOperand(&operand);
+        })) {
+      return true;
+    }
+  }


Unfortunately, verifyFillInterface emits diagnostics, even if we discarded the failure itself, so I don't think it's the right tool for this job.

sommerlukas · 2026-03-19T09:29:52Z

+  /// Map from operand space to iteration space via an indexing map.
+  TileSizes mapToIterationSpace(AffineMap indexingMap,
+                                unsigned numLoops) const {
+    TileSizes result(numLoops);
+    for (unsigned i = 0; i < indexingMap.getNumResults(); ++i) {
+      auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
+      if (!dimExpr) {
+        continue;
+      }
+      unsigned iterDim = dimExpr.getPosition();
+      result.dims[iterDim] = mergeDim(result.dims[iterDim], dims[i]);
+    }
+    return result;
+  }
+
+  /// Map from iteration space to operand space via an indexing map.
+  /// Returns empty TileSizes if any operand dim can't be determined.
+  TileSizes mapFromIterationSpace(AffineMap indexingMap) const {
+    unsigned numResults = indexingMap.getNumResults();
+    TileSizes result(numResults);
+    for (unsigned i = 0; i < numResults; ++i) {
+      auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
+      if (!dimExpr) {
+        return {};
+      }
+      unsigned iterDim = dimExpr.getPosition();
+      if (iterDim >= rank() || dims[iterDim] == kUninitialized) {
+        return {};
+      }
+      result.dims[i] = dims[iterDim];
+    }
+    return result;
+  }


Do we assert if the given indexing map is a projected permutation?

I don't think we need to assert that here. If a result is not a dim expression, that dimension is skipped already and TileSizes that are not fully defined are never materialized.

Looking at the users of mapToIterationSpace, do you really need to pass in numLoops? It can be indexingMap.getNumDims(), if IIUC. We also need to check if the ranks match or not, IIUC.

I've removed numLoops and added an assert in all places using it to assert the correct rank.

I'm thinking if we rename them to scatterToIterationSpace and gatherFromIterationSpace. The original names are okay as well. I just feel scatter and gather sounds better, and it may be just my preference. What do you think?

For me personally, gather/scatter is always linked to data movement (gathering/scattering data elements into/from a bigger vector/matrix/...). What we are doing here is using an AffineMap to map information from operand to iteration space and vice versa. Therefore, IMHO, the current name is a good description.

sommerlukas · 2026-03-19T10:02:35Z

+      if (!perDimSizes) {
+        return;
+      }


I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

hanhanW

I'll take a look at other code, but I think you forgot to delete old file (or make git mv).

hanhanW

Two high-level comments:

I think most of empty checks in tileSizes are redundant, see https://github.com/iree-org/iree/pull/23668/changes#r2966973979 for the reason
We should be consistent in terms of naming. I'm +1 on replacing all the ts with tileSizes.

hanhanW · 2026-03-20T17:46:22Z

@@ -0,0 +1,281 @@
+// RUN: iree-opt --pass-pipeline='builtin.module(any(iree-codegen-materialize-vector-tile-sizes))' --split-input-file %s | FileCheck %s


This is my first time seeing any, and I learned it. I wonder if we should just use func.func because we usually see func.func in codegen IRs.

Allowing any may introduce inconsistency when people add tests. E.g., they may use util.func.

IIRC, IREE only has one any use and they are all new code (introduced by @Groverkss ). I feel that Claude uses it once and the rest just copy-over this.

I used this intentionally. It's mainly following the idea that any operation implementing the FunctionOpInterface is a valid input.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas

Thanks for the review, I've addressed the comments. Two answers inline.

sommerlukas · 2026-03-23T08:21:24Z

+    if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
+      unsigned numLoops = linalgOp.getNumLoops();
+      TileSizes iterTileSizes(numLoops);
+      // Gather result tile sizes into iteration space via DPS init maps.


This is the backward propagation. In this case, we (potentially) have tile size information for the results of the LinalgOp and want to propagate that to its operands. This loop is not iterating over (DPS) init operands, but over the results, we're just using the indexing map for the corresponding init operand.

In this case, we're also not just propagating to the init operands (which are mostly duplicatable), but also to the other operands, if we can fully define their tile sizes.

I've removed the "DPS" from the comment.

sommerlukas · 2026-03-23T08:24:59Z

+#define GEN_PASS_DEF_MATERIALIZEVECTORTILESIZESPASS
+#include "iree/compiler/Codegen/Common/Passes.h.inc"


If it's OK for you, I'd like to keep it here. The part above is mostly the analysis, and we have a comment separating the pass bit of this file from the rest just above.

hanhanW

LGTM, just minor comments.

hanhanW · 2026-03-23T16:35:00Z

+#define GEN_PASS_DEF_MATERIALIZEVECTORTILESIZESPASS
+#include "iree/compiler/Codegen/Common/Passes.h.inc"


I still like it better if we move it to the top, because it is more like a declaration to me. I'll leave the decision to you.

hanhanW · 2026-03-23T16:37:40Z

+
+  unsigned rank() const { return dims.size(); }
+  bool empty() const { return dims.empty(); }
+  const ArrayRef<int64_t> getDims() const { return dims; }


ArrayRef already has const semantics, I think we should drop const like other implementations. E.g., https://github.com/llvm/llvm-project/blob/b670265b588f51eac9f7cc08edb3787361022af8/mlir/include/mlir/IR/BuiltinTypes.h#L65-L66

/// Returns the shape of this tensor type. ArrayRef<int64_t> getShape() const;

Suggested change

const ArrayRef<int64_t> getDims() const { return dims; }

ArrayRef<int64_t> getDims() const { return dims; }

hanhanW · 2026-03-23T16:45:04Z

 func.func @scf_if_propagation(%arg0: tensor<512xf32>, %cond: i1) -> tensor<512xf32> {
  %empty = tensor.empty() : tensor<512xf32>
  %cst = arith.constant 0.0 : f32
  %fill = linalg.fill ins(%cst : f32) outs(%empty : tensor<512xf32>) -> tensor<512xf32>


Does the linalg.fill op have a hint? Can you add the check or check-not?

hanhanW · 2026-03-23T16:54:53Z

+    if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
+      unsigned numLoops = linalgOp.getNumLoops();
+      TileSizes iterTileSizes(numLoops);
+      // Gather result tile sizes into iteration space via DPS init maps.


Okay, I misread it. It is getting the values from result not init operands. Thanks for the details!

hanhanW · 2026-03-23T16:56:04Z

+        if (tileSizes.empty()) {
+          continue;
+        }


This is a redundant check to me, per the previous high-level comments.

I removed the other checks after your previous comments but kept the ones that continue the loop early. While it's true that merge would handle this, we perform non-trivial operations before we eventually call merge, so an early continuation of the loop might make sense.

It's also an application of this rule in the LLVM Coding Standards.

What do you think, should we keep it?

Just for future reference, let me post my thought here, and thanks for addressing all the comments!

My feeling is that the rest of computation is very cheap, so I lean to remove the early-exit. The LLVM rule fits multiple conditions better to me.

It is a simple early-exit, and I can think about few downside (or I can explain what I was thinking when I saw the change).

If it shows up several times, I'd need to look at other places that don't have the logic. If we require it, then it spams.

If I understand the idea correctly and based on my linalg knowledge, it is handling "scalar operand", which indicates the indexing map is trivial and it is not a special case.

When you have very, very small loops, this sort of structure is fine.

I don't expect the loop becoming large, and it makes code cleaner.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas self-assigned this Mar 5, 2026

sommerlukas requested review from Groverkss, MaheshRavishankar, Max191, hanhanW, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners March 5, 2026 15:39

sommerlukas marked this pull request as draft March 5, 2026 17:03

sommerlukas marked this pull request as ready for review March 6, 2026 16:31

sommerlukas mentioned this pull request Mar 10, 2026

[Codegen] Introduce VectorizableOpInterface and migrate LinalgExt::GatherOp #23653

Merged

sommerlukas force-pushed the tile-size-analysis branch from 6bae260 to 23f31ac Compare March 12, 2026 17:31

Groverkss reviewed Mar 16, 2026

View reviewed changes

sommerlukas force-pushed the tile-size-analysis branch from 37358a6 to 310c668 Compare March 16, 2026 15:33

sommerlukas requested a review from Groverkss March 16, 2026 15:46

hanhanW requested changes Mar 16, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated

Groverkss reviewed Mar 18, 2026

View reviewed changes

Groverkss approved these changes Mar 18, 2026

View reviewed changes

sommerlukas requested a review from hanhanW March 18, 2026 15:22

hanhanW reviewed Mar 18, 2026

View reviewed changes

sommerlukas commented Mar 19, 2026

View reviewed changes

sommerlukas force-pushed the tile-size-analysis branch from e59654b to 89700ca Compare March 19, 2026 10:20

sommerlukas requested a review from hanhanW March 19, 2026 12:12

hanhanW reviewed Mar 19, 2026

View reviewed changes

hanhanW requested changes Mar 20, 2026

View reviewed changes

sommerlukas added 2 commits March 23, 2026 08:33

[VectorDistribute] Add VectorTileSizeAnalysis

1376f48

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Use dataflow framework for vector tile size analysis

5aa4889

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas added 5 commits March 23, 2026 08:33

Address PR feedback

68726e6

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Track only one tile size per dim

ae181ad

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Address PR feedback

939c840

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Delete old implementation file

8e9a9aa

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Address more PR feedback

93e18b5

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas force-pushed the tile-size-analysis branch from 2e6f8ad to 93e18b5 Compare March 23, 2026 09:06

sommerlukas commented Mar 23, 2026

View reviewed changes

sommerlukas requested a review from hanhanW March 23, 2026 13:37

hanhanW approved these changes Mar 23, 2026

View reviewed changes

Address PR comments

5f34dd9

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas merged commit 409bcca into iree-org:main Mar 24, 2026
55 of 57 checks passed

sommerlukas deleted the tile-size-analysis branch March 24, 2026 13:25

sommerlukas mentioned this pull request Apr 13, 2026

[VectorDistribute] Improve handling of hardware-unfriendly shapes #23415

Closed

		@@ -0,0 +1,281 @@
		// RUN: iree-opt --pass-pipeline='builtin.module(any(iree-codegen-materialize-vector-tile-sizes))' --split-input-file %s \| FileCheck %s

		#define GEN_PASS_DEF_MATERIALIZEVECTORTILESIZESPASS
		#include "iree/compiler/Codegen/Common/Passes.h.inc"

	const ArrayRef<int64_t> getDims() const { return dims; }
	ArrayRef<int64_t> getDims() const { return dims; }

Conversation

sommerlukas commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sommerlukas commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Groverkss left a comment

Choose a reason for hiding this comment

Uh oh!

sommerlukas commented Mar 18, 2026

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sommerlukas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sommerlukas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sommerlukas commented Mar 5, 2026 •

edited

Loading

sommerlukas commented Mar 17, 2026 •

edited

Loading