Skip to content

Add modular transform library with reusable named sequences#25

Merged
erwei-xilinx merged 1 commit into
mainfrom
modular-transform-library
Mar 25, 2026
Merged

Add modular transform library with reusable named sequences#25
erwei-xilinx merged 1 commit into
mainfrom
modular-transform-library

Conversation

@erwei-xilinx

Copy link
Copy Markdown
Collaborator

Summary

  • Add transform_library.mlir with 17 reusable transform.named_sequence definitions covering canonicalization, elementwise fusion, tiling, pad+promote, bufferization, post-bufferize cleanup, AIR herd mapping, and vector type casts
  • Refactor all elementwise transform scripts (relu, sigmoid, silu, gelu, leaky_relu, swiglu, axpy, vec-add) to use transform.include — reducing each from 144-195 lines to 42-71 lines (~70% reduction)
  • Partially refactor complex scripts (softmax, layernorm, matmul, rms_norm, weighted_rms_norm, average_pool, matvec) where boilerplate canonicalize/bufferize blocks match library sequences
  • driver.py auto-expands transform.include calls at load time by inlining library sequence bodies with SSA renaming, since mlir-air's run_transform does not natively support transform.include
  • Net: -1321 lines across 28 files, with no behavioral changes

Test plan

  • Clear triton cache and run all 8 elementwise examples on NPU2 (AIE2P/Strix) — all pass
  • Run softmax, layernorm, matmul on NPU2 — all pass
  • CI build validation
  • Run on NPU1 (AIE2/Phoenix) if available

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 25, 2026 01:31
@erwei-xilinx erwei-xilinx force-pushed the modular-transform-library branch from c9a36d3 to c464b33 Compare March 25, 2026 01:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared MLIR Transform Dialect “library” and refactors many per-example transform scripts to reuse it via transform.include, with driver-side inlining to work around missing/unstable transform.include support in the underlying runtime.

Changes:

  • Add amd_triton_npu/backend/transform_library.mlir with reusable transform.named_sequence blocks for canonicalization, tiling, pad+promote, bufferization, AIR herd mapping, and common vector type casts.
  • Update amd_triton_npu/backend/driver.py to expand transform.include by inlining sequence bodies from transform_library.mlir.
  • Refactor multiple example transform scripts to use transform.include (reducing duplicated boilerplate).

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
examples/weighted_rms_norm/transform_aie2p.mlir Replace inline canonicalize/CSE + bufferize blocks with library transform.include calls.
examples/weighted_rms_norm/transform_aie2.mlir Add AIE2 transform script; uses library bufferize include in one phase.
examples/vec-add/transform_aie2p.mlir Rewrite vec-add transform to use library sequences and a small custom vector tiling step.
examples/vec-add/transform_aie2.mlir Same refactor for AIE2 (different vector tile size).
examples/test_softmax/transform_aie2p.mlir Replace repeated canonicalize/bufferize blocks with library includes.
examples/test_softmax/transform_aie2.mlir Same include-based refactor for AIE2.
examples/test_layernorm/transform_aie2p.mlir Replace repeated canonicalize/bufferize blocks with library includes; keep custom cleanup where needed.
examples/test_layernorm/transform_aie2.mlir Same include-based refactor for AIE2.
examples/swiglu/transform_aie2p.mlir Large reduction of boilerplate via library sequences + shared cast helper.
examples/swiglu/transform_aie2.mlir Same refactor for AIE2, using extern-linking herd mapping helper.
examples/silu/transform_aie2p.mlir Refactor to shared elementwise pipeline + sigmoid-family cast helper.
examples/silu/transform_aie2.mlir Same refactor for AIE2, using extern-linking herd mapping helper.
examples/sigmoid/transform_aie2p.mlir Refactor to shared elementwise pipeline + sigmoid-family cast helper.
examples/sigmoid/transform_aie2.mlir Same refactor for AIE2, using extern-linking herd mapping helper.
examples/rms_norm/transform_aie2p.mlir Replace one-shot bufferize block with @one_shot_bufferize include.
examples/relu/transform_aie2p.mlir Refactor to shared elementwise pipeline + maxnumf cast helper.
examples/relu/transform_aie2.mlir Same refactor for AIE2.
examples/matvec/transform_aie2p.mlir Add GEMV transform script; uses includes for canonicalize+CSE and bufferize.
examples/matvec/transform_aie2.mlir Add GEMV transform script (AIE2 variant); uses includes for canonicalize+CSE and bufferize.
examples/matmul/transform_aie2p.mlir Replace repeated canonicalize/CSE + bufferize blocks with library includes.
examples/matmul/transform_aie2.mlir Same include-based refactor for AIE2.
examples/leaky_relu/transform_aie2p.mlir Refactor to shared elementwise pipeline + leaky-relu cast helper.
examples/leaky_relu/transform_aie2.mlir Same refactor for AIE2.
examples/gelu/transform_aie2p.mlir Refactor to shared elementwise pipeline + sigmoid-family cast helper.
examples/axpy/transform_aie2p.mlir Refactor to shared binary pipeline + addf/mulf cast helpers.
examples/axpy/transform_aie2.mlir Same refactor for AIE2.
examples/average_pool/transform_aie2p.mlir Replace one-shot bufferize block with @one_shot_bufferize include.
examples/average_pool/transform_aie2.mlir Same include replacement for AIE2.
amd_triton_npu/backend/transform_library.mlir New shared sequence library for canonicalize/tiling/promotion/bufferize/AIR mapping/casts.
amd_triton_npu/backend/driver.py Add include-expansion logic to inline library sequences into user transform scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread amd_triton_npu/backend/driver.py
Comment thread amd_triton_npu/backend/transform_library.mlir Outdated
@erwei-xilinx erwei-xilinx force-pushed the modular-transform-library branch 5 times, most recently from 985ccc2 to 5c0f634 Compare March 25, 2026 04:53
Create a shared transform_library.mlir with 17 reusable named sequences
(canonicalize, fuse, tile, pad+promote, bufferize, cleanup, herd mapping,
type casts) and refactor all elementwise transform scripts to use them
via transform.include. driver.py auto-expands includes at load time since
mlir-air's run_transform does not support transform.include natively.

Elementwise scripts (relu, sigmoid, silu, gelu, leaky_relu, swiglu, axpy,
vec-add) reduced from 144-195 lines to 42-71 lines each (~70% reduction).
Complex scripts (softmax, layernorm, matmul, rms_norm, weighted_rms_norm,
average_pool, matvec) partially refactored where boilerplate matches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@erwei-xilinx erwei-xilinx merged commit c9ffb4f into main Mar 25, 2026
8 of 9 checks passed
@erwei-xilinx erwei-xilinx deleted the modular-transform-library branch March 25, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants