Skip to content

Commit b2dcdef

Browse files
erwei-xilinxclaude
andcommitted
Update rms_norm transform script based on actual IR structure
Rewrite rms_norm transform to match the Linalg IR that triton-shared-opt actually produces (elementwise square + reduce + scalar chain + normalize). Uses fuse_elementwise_linalg for post-reduce ops and tile_using_forall for multi-core parallelism. Still WIP: fails at air-wrap-func-with-parallel because scalar reduction intermediates (tensor.extract -> divf -> rsqrt chain) create memref.alloc in memory_space 0 inside air.segment. Needs promote_tensor approach from softmax to properly handle scalar reduction results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6ab2e78 commit b2dcdef

2 files changed

Lines changed: 349 additions & 358 deletions

File tree

0 commit comments

Comments
 (0)