Open
Conversation
Reorder the apply interchange pass to run after bufferization but before allocation, enabling interchange decisions to be visible to the allocator.
Replace hardcoded arith.constant loop bounds and block factor values with d2m.get_block_factor ops, making the pass reference the parent generic op's block factors symbolically rather than materializing them as integer constants.
Replace scf.for loop generation with affine.for loops using symbol-based upper bounds from GetBlockFactorOp. This enables downstream affine analysis and transformation passes to operate on the generated loop nest.
…Load/Store ops RemoteLoadOp now implements AffineReadOpInterface and AffineMapAccessInterface, and RemoteStoreOp implements AffineWriteOpInterface and AffineMapAccessInterface. This enables affine analysis passes (e.g. affine scalar replacement) to reason about these ops' memory access patterns. getAffineMap() looks up the indexing map from the parent GenericOp for the associated operand.
…ants Replace affine.for loops marked with d2m.outer_loop (generated by D2MGenerateOuterLoops) with scf.for loops that use arith.constant bounds derived from the GenericOp's block_factors attribute. Also replaces all d2m.get_block_factor ops with their corresponding constant values.
Rename the "d2m.outer_loop" attribute to "d2m.blocking_loop" across all D2M passes and tests. Additionally, change the attribute from a unit attribute to an integer attribute that carries the associated block factor index (0 = outermost loop, incrementing inward).
Replace OpRewritePattern/applyPatternsGreedily with a simple module pass using while(changed) iteration to avoid greedy rewriter erasure assertions in GenericOp's SSACFG regions. Replace canFuseLoops/fuseLoops with manual per-level body cloning since RemoteLoad/Store's AffineReadOp/WriteOp interfaces crash affine dependence analysis. Keep shared intermediate as fused input to satisfy GenericOp's single-output verifier constraint.
…stRegisterAccess Use the d2m.blocking_loop attribute on scf.for loops to directly obtain induction variables instead of creating IterIndexOp. This is more correct since IterIndexOp::fold was incorrectly folding iter_index(dim) to the constant dim value. When a blocking loop is absent (unit loop optimized away), constant 0 is returned.
… ops Skip GenericAffineLoopFusion for generics containing block_mask, packer_mask_reset, tile_tilize_block, tile_untilize_block, write_row_mask_tile, and write_col_mask_tile ops.
Ensure affine loop fusion only considers GenericOps in unified form (single region with ThreadType::Unified). This prevents the pass from attempting to fuse ops that have already been split or are in an unexpected form.
…bset When producer and consumer have equal loop depth, the producer becomes the subset and its ops were incorrectly placed after the consumer's ops in the fused loop body. Fix by inserting subset ops at the start of each loop level when the subset is the producer, preserving data-flow order.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Drop generic affine compatibility-form conversion from scalar replacement, delete dead GenericAffineUtils code and build wiring, and keep the scalar-replacement test expectations aligned with the new flow. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep d2m.block_offset explicit by removing constant-like folding assumptions, and add a canonicalize regression test to prevent future erasure. Include related D2M pipeline/test expectation updates needed for current affine fusion configuration behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Use deterministic prime placeholders for temporary block_offset rewriting in affine utilities, restore d2m.block_offset after transforms, re-enable fusion lit coverage, and add scalar-replacement roundtrip checks.
6886071 to
c65faf6
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #7045 +/- ##
==========================================
+ Coverage 69.27% 69.40% +0.13%
==========================================
Files 384 388 +4
Lines 67207 67852 +645
==========================================
+ Hits 46555 47092 +537
- Misses 20652 20760 +108 ☔ View full report in Codecov by Sentry. |
bgrady-tt
commented
Feb 13, 2026
| // only works on top-level modules (doesn't run module has a parent op). | ||
| ttmetal::TTIRToTTMetalPipelineOptions ttmetalOptions; | ||
| ttmetalOptions.ttnnMode = true; | ||
| ttmetalOptions.enableAffineLoopFusionAndScalarReplacement = false; |
Contributor
Author
There was a problem hiding this comment.
Force affine fusion disabled until feature is more mature.
bgrady-tt
commented
Feb 13, 2026
| // Get the CB operand and find which block argument it corresponds to. | ||
| Value cb = remoteLoad.getCb(); | ||
| if (auto blockArg = mlir::dyn_cast<BlockArgument>(cb)) { | ||
| if (auto blockArg = mlir::dyn_cast_or_null<BlockArgument>(cb)) { |
Contributor
Author
There was a problem hiding this comment.
Just minor cleanup to unsafe dyn_cast here
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: This PR does not enable affine fusion or scalrep passes by default. Currently only a few lit tests exercise these passes. This is due to lack of support in downstream passes to handle intermediate scratchpad allocations properly.
Big New Passes:
d2m.scratch_allocatefor allocating all intermediates (cannot be lowered as-is).This PRs impact to default pipeline is:
get_block_factor()).LowerToExplicitFormconverts this affine form with unresolved blocking factors to the fully explicit generic form that has hardened block factors and scf.for loops. This conversion happens immediately post-allocator