Skip to content

[VectorDistribute] Add VectorTileSizeAnalysis#23668

Merged
sommerlukas merged 8 commits intoiree-org:mainfrom
sommerlukas:tile-size-analysis
Mar 24, 2026
Merged

[VectorDistribute] Add VectorTileSizeAnalysis#23668
sommerlukas merged 8 commits intoiree-org:mainfrom
sommerlukas:tile-size-analysis

Conversation

@sommerlukas
Copy link
Copy Markdown
Contributor

@sommerlukas sommerlukas commented Mar 5, 2026

Motivation for this change is to better support masking to make up for differences between actual tensor shape and tile sizes chosen to match the HW. For example, lowering config selection would be able to choose a tile size of 64, even if the input has size 127. To ensure correct computation, proper masking needs to be introduced in this case during GenericVectorization. To that end, GenericVectorization needs reliable vector tile size information.

The goal of the VectorTileSizeAnalysis is to provide such information. The analysis is seeded from to_layout operations, which in turn have been inserted based on lowering_config. The VectorTileSizeAnalysis propagates the information through the graph. It uses the upstream MLIR dataflow analysis framework and combines a sparse forward and sparse backward analysis in a single solver with a shared lattice.

Conflicts are resolved by an overdefined/"top" state. Consumers such as GenericVectorization can then choose a candidate, e.g. based on heuristics.

To make the vector tile size readily available in GenericVectorization, the MaterializeVectorTileSizePass materializes the analysis results as discardable attribute on compute ops.

This is part of #23415.

Assisted-by: Claude Code

Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Copy link
Copy Markdown
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! I can see the value of this pass, which is an improved version of useConfiguredSizes in generic vectorization. I'm thinking to extend the pass (in a follow-up) which moves the useConfiguredSizes logic into the anslysis, which can work well with the getVectorSize lowering config interface method.

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
@sommerlukas
Copy link
Copy Markdown
Contributor Author

sommerlukas commented Mar 17, 2026

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:

%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...

Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

Copy link
Copy Markdown
Contributor

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo hanhan's comments. I think it's okay to just bail out on duplicate sizes for now and we can improve if needed.

@sommerlukas
Copy link
Copy Markdown
Contributor Author

I've updated the analysis to only track one size per dimension. In case of conflict, we now go to a top/overdefined state. Propagation from duplicatable operations is now completely disabled, which avoids conflict in the scenario I outlined above.

The tracking of one size per dimension also facilitates adding the scalable flag to the analysis for CPU later on.

@sommerlukas sommerlukas requested a review from hanhanW March 18, 2026 15:22
Copy link
Copy Markdown
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:

%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...

Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

I still think that it is a bug if it happens. It means that the config generator has a bug. The example you showed is valid, and you have a good fix for it. We should stop propagation if it is tensor.empty() op. It is describing the shape of the result, but it is not a connection between two linalg.generic ops.

Anyway, the current revision mostly looks good. I left few comments, please take a look. Thanks!

Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment on lines +492 to +494
if (!perDimSizes) {
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either emit a message or signal a failure when it happens.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should emit errors or at least warnings if there are overdefined tile sizes. It means that there is a bug in the analysis or input program.

It is okay if there are uninitialized tile sizes, because some dimension may not be tiled due to odd reasons.

The point is over-defined is a bug to me.

Copy link
Copy Markdown
Contributor Author

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @hanhanW! I've addressed the comments and put some more replies inline.

Comment on lines +220 to +230
// A linalg op that doesn't read any tensor data (e.g., linalg.fill or a
// fill-like linalg.generic broadcasting a scalar) is a generator and
// duplicatable.
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(defOp)) {
if (llvm::none_of(linalgOp->getOpOperands(), [&](OpOperand &operand) {
return isa<ShapedType>(operand.get().getType()) &&
linalgOp.payloadUsesValueFromOperand(&operand);
})) {
return true;
}
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, verifyFillInterface emits diagnostics, even if we discarded the failure itself, so I don't think it's the right tool for this job.

Comment on lines +141 to +173
/// Map from operand space to iteration space via an indexing map.
TileSizes mapToIterationSpace(AffineMap indexingMap,
unsigned numLoops) const {
TileSizes result(numLoops);
for (unsigned i = 0; i < indexingMap.getNumResults(); ++i) {
auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
if (!dimExpr) {
continue;
}
unsigned iterDim = dimExpr.getPosition();
result.dims[iterDim] = mergeDim(result.dims[iterDim], dims[i]);
}
return result;
}

/// Map from iteration space to operand space via an indexing map.
/// Returns empty TileSizes if any operand dim can't be determined.
TileSizes mapFromIterationSpace(AffineMap indexingMap) const {
unsigned numResults = indexingMap.getNumResults();
TileSizes result(numResults);
for (unsigned i = 0; i < numResults; ++i) {
auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
if (!dimExpr) {
return {};
}
unsigned iterDim = dimExpr.getPosition();
if (iterDim >= rank() || dims[iterDim] == kUninitialized) {
return {};
}
result.dims[i] = dims[iterDim];
}
return result;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assert if the given indexing map is a projected permutation?

I don't think we need to assert that here. If a result is not a dim expression, that dimension is skipped already and TileSizes that are not fully defined are never materialized.

Looking at the users of mapToIterationSpace, do you really need to pass in numLoops? It can be indexingMap.getNumDims(), if IIUC. We also need to check if the ranks match or not, IIUC.

I've removed numLoops and added an assert in all places using it to assert the correct rank.

I'm thinking if we rename them to scatterToIterationSpace and gatherFromIterationSpace. The original names are okay as well. I just feel scatter and gather sounds better, and it may be just my preference. What do you think?

For me personally, gather/scatter is always linked to data movement (gathering/scattering data elements into/from a bigger vector/matrix/...). What we are doing here is using an AffineMap to map information from operand to iteration space and vice versa. Therefore, IMHO, the current name is a good description.

Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp Outdated
Comment on lines +492 to +494
if (!perDimSizes) {
return;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

@sommerlukas sommerlukas requested a review from hanhanW March 19, 2026 12:12
Copy link
Copy Markdown
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at other code, but I think you forgot to delete old file (or make git mv).

Copy link
Copy Markdown
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two high-level comments:

Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
@@ -0,0 +1,281 @@
// RUN: iree-opt --pass-pipeline='builtin.module(any(iree-codegen-materialize-vector-tile-sizes))' --split-input-file %s | FileCheck %s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first time seeing any, and I learned it. I wonder if we should just use func.func because we usually see func.func in codegen IRs.

Allowing any may introduce inconsistency when people add tests. E.g., they may use util.func.

IIRC, IREE only has one any use and they are all new code (introduced by @Groverkss ). I feel that Claude uses it once and the rest just copy-over this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this intentionally. It's mainly following the idea that any operation implementing the FunctionOpInterface is a valid input.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Copy link
Copy Markdown
Contributor Author

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I've addressed the comments. Two answers inline.

if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
unsigned numLoops = linalgOp.getNumLoops();
TileSizes iterTileSizes(numLoops);
// Gather result tile sizes into iteration space via DPS init maps.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the backward propagation. In this case, we (potentially) have tile size information for the results of the LinalgOp and want to propagate that to its operands. This loop is not iterating over (DPS) init operands, but over the results, we're just using the indexing map for the corresponding init operand.

In this case, we're also not just propagating to the init operands (which are mostly duplicatable), but also to the other operands, if we can fully define their tile sizes.

I've removed the "DPS" from the comment.

Comment on lines +430 to +431
#define GEN_PASS_DEF_MATERIALIZEVECTORTILESIZESPASS
#include "iree/compiler/Codegen/Common/Passes.h.inc"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's OK for you, I'd like to keep it here. The part above is mostly the analysis, and we have a comment separating the pass bit of this file from the rest just above.

@sommerlukas sommerlukas requested a review from hanhanW March 23, 2026 13:37
Copy link
Copy Markdown
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor comments.

Comment on lines +430 to +431
#define GEN_PASS_DEF_MATERIALIZEVECTORTILESIZESPASS
#include "iree/compiler/Codegen/Common/Passes.h.inc"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like it better if we move it to the top, because it is more like a declaration to me. I'll leave the decision to you.


unsigned rank() const { return dims.size(); }
bool empty() const { return dims.empty(); }
const ArrayRef<int64_t> getDims() const { return dims; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayRef already has const semantics, I think we should drop const like other implementations. E.g., https://github.com/llvm/llvm-project/blob/b670265b588f51eac9f7cc08edb3787361022af8/mlir/include/mlir/IR/BuiltinTypes.h#L65-L66

  /// Returns the shape of this tensor type.
  ArrayRef<int64_t> getShape() const;
Suggested change
const ArrayRef<int64_t> getDims() const { return dims; }
ArrayRef<int64_t> getDims() const { return dims; }

func.func @scf_if_propagation(%arg0: tensor<512xf32>, %cond: i1) -> tensor<512xf32> {
%empty = tensor.empty() : tensor<512xf32>
%cst = arith.constant 0.0 : f32
%fill = linalg.fill ins(%cst : f32) outs(%empty : tensor<512xf32>) -> tensor<512xf32>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the linalg.fill op have a hint? Can you add the check or check-not?

if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
unsigned numLoops = linalgOp.getNumLoops();
TileSizes iterTileSizes(numLoops);
// Gather result tile sizes into iteration space via DPS init maps.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I misread it. It is getting the values from result not init operands. Thanks for the details!

Comment on lines +285 to +287
if (tileSizes.empty()) {
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a redundant check to me, per the previous high-level comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the other checks after your previous comments but kept the ones that continue the loop early. While it's true that merge would handle this, we perform non-trivial operations before we eventually call merge, so an early continuation of the loop might make sense.

It's also an application of this rule in the LLVM Coding Standards.

What do you think, should we keep it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for future reference, let me post my thought here, and thanks for addressing all the comments!

My feeling is that the rest of computation is very cheap, so I lean to remove the early-exit. The LLVM rule fits multiple conditions better to me.

It is a simple early-exit, and I can think about few downside (or I can explain what I was thinking when I saw the change).

  • If it shows up several times, I'd need to look at other places that don't have the logic. If we require it, then it spams.
  • If I understand the idea correctly and based on my linalg knowledge, it is handling "scalar operand", which indicates the indexing map is trivial and it is not a special case.

When you have very, very small loops, this sort of structure is fine.

I don't expect the loop becoming large, and it makes code cleaner.

Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Common/MaterializeVectorTileSizes.cpp Outdated
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
@sommerlukas sommerlukas merged commit 409bcca into iree-org:main Mar 24, 2026
55 of 57 checks passed
@sommerlukas sommerlukas deleted the tile-size-analysis branch March 24, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants