Add modular transform library with reusable named sequences#25
Merged
Conversation
c9a36d3 to
c464b33
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared MLIR Transform Dialect “library” and refactors many per-example transform scripts to reuse it via transform.include, with driver-side inlining to work around missing/unstable transform.include support in the underlying runtime.
Changes:
- Add
amd_triton_npu/backend/transform_library.mlirwith reusabletransform.named_sequenceblocks for canonicalization, tiling, pad+promote, bufferization, AIR herd mapping, and common vector type casts. - Update
amd_triton_npu/backend/driver.pyto expandtransform.includeby inlining sequence bodies fromtransform_library.mlir. - Refactor multiple example transform scripts to use
transform.include(reducing duplicated boilerplate).
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/weighted_rms_norm/transform_aie2p.mlir | Replace inline canonicalize/CSE + bufferize blocks with library transform.include calls. |
| examples/weighted_rms_norm/transform_aie2.mlir | Add AIE2 transform script; uses library bufferize include in one phase. |
| examples/vec-add/transform_aie2p.mlir | Rewrite vec-add transform to use library sequences and a small custom vector tiling step. |
| examples/vec-add/transform_aie2.mlir | Same refactor for AIE2 (different vector tile size). |
| examples/test_softmax/transform_aie2p.mlir | Replace repeated canonicalize/bufferize blocks with library includes. |
| examples/test_softmax/transform_aie2.mlir | Same include-based refactor for AIE2. |
| examples/test_layernorm/transform_aie2p.mlir | Replace repeated canonicalize/bufferize blocks with library includes; keep custom cleanup where needed. |
| examples/test_layernorm/transform_aie2.mlir | Same include-based refactor for AIE2. |
| examples/swiglu/transform_aie2p.mlir | Large reduction of boilerplate via library sequences + shared cast helper. |
| examples/swiglu/transform_aie2.mlir | Same refactor for AIE2, using extern-linking herd mapping helper. |
| examples/silu/transform_aie2p.mlir | Refactor to shared elementwise pipeline + sigmoid-family cast helper. |
| examples/silu/transform_aie2.mlir | Same refactor for AIE2, using extern-linking herd mapping helper. |
| examples/sigmoid/transform_aie2p.mlir | Refactor to shared elementwise pipeline + sigmoid-family cast helper. |
| examples/sigmoid/transform_aie2.mlir | Same refactor for AIE2, using extern-linking herd mapping helper. |
| examples/rms_norm/transform_aie2p.mlir | Replace one-shot bufferize block with @one_shot_bufferize include. |
| examples/relu/transform_aie2p.mlir | Refactor to shared elementwise pipeline + maxnumf cast helper. |
| examples/relu/transform_aie2.mlir | Same refactor for AIE2. |
| examples/matvec/transform_aie2p.mlir | Add GEMV transform script; uses includes for canonicalize+CSE and bufferize. |
| examples/matvec/transform_aie2.mlir | Add GEMV transform script (AIE2 variant); uses includes for canonicalize+CSE and bufferize. |
| examples/matmul/transform_aie2p.mlir | Replace repeated canonicalize/CSE + bufferize blocks with library includes. |
| examples/matmul/transform_aie2.mlir | Same include-based refactor for AIE2. |
| examples/leaky_relu/transform_aie2p.mlir | Refactor to shared elementwise pipeline + leaky-relu cast helper. |
| examples/leaky_relu/transform_aie2.mlir | Same refactor for AIE2. |
| examples/gelu/transform_aie2p.mlir | Refactor to shared elementwise pipeline + sigmoid-family cast helper. |
| examples/axpy/transform_aie2p.mlir | Refactor to shared binary pipeline + addf/mulf cast helpers. |
| examples/axpy/transform_aie2.mlir | Same refactor for AIE2. |
| examples/average_pool/transform_aie2p.mlir | Replace one-shot bufferize block with @one_shot_bufferize include. |
| examples/average_pool/transform_aie2.mlir | Same include replacement for AIE2. |
| amd_triton_npu/backend/transform_library.mlir | New shared sequence library for canonicalize/tiling/promotion/bufferize/AIR mapping/casts. |
| amd_triton_npu/backend/driver.py | Add include-expansion logic to inline library sequences into user transform scripts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
985ccc2 to
5c0f634
Compare
Create a shared transform_library.mlir with 17 reusable named sequences (canonicalize, fuse, tile, pad+promote, bufferize, cleanup, herd mapping, type casts) and refactor all elementwise transform scripts to use them via transform.include. driver.py auto-expands includes at load time since mlir-air's run_transform does not support transform.include natively. Elementwise scripts (relu, sigmoid, silu, gelu, leaky_relu, swiglu, axpy, vec-add) reduced from 144-195 lines to 42-71 lines each (~70% reduction). Complex scripts (softmax, layernorm, matmul, rms_norm, weighted_rms_norm, average_pool, matvec) partially refactored where boilerplate matches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
transform_library.mlirwith 17 reusabletransform.named_sequencedefinitions covering canonicalization, elementwise fusion, tiling, pad+promote, bufferization, post-bufferize cleanup, AIR herd mapping, and vector type caststransform.include— reducing each from 144-195 lines to 42-71 lines (~70% reduction)driver.pyauto-expandstransform.includecalls at load time by inlining library sequence bodies with SSA renaming, since mlir-air'srun_transformdoes not natively supporttransform.includeTest plan
🤖 Generated with Claude Code