Commit b2dcdef
Update rms_norm transform script based on actual IR structure
Rewrite rms_norm transform to match the Linalg IR that triton-shared-opt
actually produces (elementwise square + reduce + scalar chain + normalize).
Uses fuse_elementwise_linalg for post-reduce ops and tile_using_forall
for multi-core parallelism.
Still WIP: fails at air-wrap-func-with-parallel because scalar reduction
intermediates (tensor.extract -> divf -> rsqrt chain) create memref.alloc
in memory_space 0 inside air.segment. Needs promote_tensor approach from
softmax to properly handle scalar reduction results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 6ab2e78 commit b2dcdef
2 files changed
Lines changed: 349 additions & 358 deletions
0 commit comments