Commit f9e8a6e
Three follow-up fixes for channel_type="mmio" multi-tile / non-i32 use (#1573)
* Skip mmio channels in air-dma-to-channel shim-pressure auto-upgrade
The shim-pressure heuristic in AIRDmaToChannel auto-upgrades L3-bound
channels to dma_packet when their per-column count exceeds the shim
S2MM/MM2S limit. mmio channels are runtime-sequence MMIO writes, not
shim DMA, so they neither contribute to pressure nor are eligible for
the upgrade. Counting them in the pressure check force-flipped their
type tag to dma_packet, which destroyed the mmio lowering before it
ran.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Skip mmio channels in specializeChannelBundle
specializeChannelBundle splits a `[N]`-sized channel into N
single-position channels and rewrites all matching put/get ops
inside the device. For mmio channels with a multi-tile herd, this
left the host-side puts orphaned on the original bundle symbol —
they sit outside the device, where this pattern's rewrites don't
reach — while the per-tile gets had been moved to new specialized
channel names. The mmio lowering then saw N hostPuts on the
bundled channel with 0 matching device-side gets and emitted
"channel_type=\"mmio\" put has no matching device-side
air.channel.get".
Skip mmio channels here and let lowerAIRMMIOChannelOps match
host-side puts to per-core gets directly across the full bundle by
constant index — the same path it already takes for single-tile
mmio.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Repack non-i32 mmio source globals as memref<Nxi32> for blockwrite
aiex.npu.blockwrite's translator only handles 32-bit element types
("Only 32-bit data type is supported for now"); a bf16 source warned
then segfaulted in AIETranslateNpuToBinary. The destination buffer
type is irrelevant on the wire — `buffer = @sym` is just a symbol
ref — so the fix is local to the data side.
When the original memref.global isn't already i32-typed, mirror it
into the device as a 1-D memref<Nxi32> with the same raw bytes
(suffixed `_mmio_i32`) and reference that from the blockwrite. The
original global is kept undisturbed for any other uses. Splat
attributes are expanded to a full byte buffer before reinterpretation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Factor mmio repack lowering, drop duplication, expand tests
The non-i32 mmio repack path in lowerAIRMMIOChannelOps had grown a
~120-line inline block with three issues review surfaced:
- The repack-to-i32 byte transform was rebuilt as a stack lambda
inside the per-put loop and called twice with identical inputs
(once for the module-scope mirror, once for the in-device clone).
- The two memref.global creation paths used inconsistent builders;
the in-device path went through the rewriter then called
cloned->remove() to undo the rewriter's insertion.
- A `(void)modI32;` swallowed the return of the module-scope
create because the op was found later by symbol lookup, not by
the local binding.
Lift the byte transform to a file-scope `repackAsI32Bytes` static
helper, compute the repacked DenseElementsAttr once, and use a fresh
OpBuilder for both mirror creations so the rewriter-detach hack is
gone. Reject collisions where the suffixed `_mmio_i32` symbol already
exists at module scope and isn't itself an `air.mmio_global`.
Test coverage:
- Tighten the splat bf16 test to assert the exact repacked value
(`dense<1069563840>` = two bf16 1.5 packed into one i32) so the
splat-expansion bytes are checked, not just the type.
- Add a non-splat bf16 case (`@bf16_nonsplat`) that exercises the
raw-buffer copy branch and asserts both packed i32 values.
- Add an invalid case for a byte-aligned element type whose total
payload size isn't a multiple of 4 bytes (memref<3xbf16> = 6
bytes), asserting the new diagnostic.
No functional change for callers — same diagnostics, same emitted IR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Preserve source alignment on the mmio i32 mirror globals
Both `memref::GlobalOp::create` calls in the repack path passed
`IntegerAttr{}` for the alignment argument, dropping any explicit
`alignment = N : i64` attribute carried by the source `memref.global`.
The non-repack path preserves alignment for free via `clone()`.
Forward `moduleGlobal.getAlignmentAttr()` to both the module-scope
and in-device mirrors so the two paths agree, and extend the splat
bf16 lit case to assert the round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Use SymbolTable lookup instead of device.walk for in-device global
AIE::DeviceOp carries the SymbolTable trait, so the linear walk to
find the in-device mirror by name is an O(N) operation where O(log N)
suffices. Switch to `SymbolTable::lookupSymbolIn(device, cloneName)`,
matching the lookup style used for the module-side global a few lines
above (and elsewhere in this file, e.g. line 4554).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Mark mmio repack path as removable once upstream blockwrite gains non-i32
Add a TODO at the top of the repack block pointing at the two
mlir-aie sites that enforce the 32-bit-only payload restriction
(`NpuBlockWriteOp::getDataWords` in AIEXDialect.cpp and the
analogous warning in AIETxnToControlPacket.cpp). When those learn
to handle non-i32 element types, the entire repack path and its
`_mmio_i32` mirror globals can be deleted as dead code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add i8 and i16 mmio repack lit cases
The repack helper handles any positive-multiple-of-8 element
bitwidth, but coverage was bf16-only. Add two more positive cases
that exercise different strides through repackAsI32Bytes:
- i8 splat (4xi8 = dense<66>): bytesPerElt=1, splat-expansion
loop copies a single byte per iteration; result packs to one
i32 = 0x42424242 = 1111638594.
- i16 non-splat (2xi16 = {0x1234, 0xABCD}): pure-int storage
path through the wholesale-copy branch; LE byte stream
{34, 12, CD, AB} packs to one signed i32
0xABCD1234 = -1412623820.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Reject mmio repack on uninitialized memref.global with a clean diagnostic
The repack path called `*moduleGlobal.getInitialValue()` and cast
the result to DenseElementsAttr without checking either step. A
pure declaration (no `= dense<...>` initializer) returns
`std::nullopt` and crashed via the optional dereference assertion;
an uninitialized definition (UnitAttr initializer) would have
crashed the cast.
Guard both up front: bind `getInitialValue()` to an optional, then
`dyn_cast<DenseElementsAttr>` and reject with a clean op error if
either fails. Add a lit case for the uninitialized-global path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Name the symbol-dce site that cleans up mmio module-scope globals
The two AIRToAIEPass.cpp comments and the matching test-file note
referenced "symbol-dce" generically, leaving a future debugger to
guess where the cleanup actually fires. The concrete site is the
`symbol-dce` pass invoked twice in the NPU pipeline at
tools/aircc/aircc.cpp:1117 and 1123 — once before and once after
`airrt-to-npu` — which already carries an inline note about
dropping mmio-orphaned globals.
Update the three comments to name that file/pass so the load-bearing
dependency is greppable from either side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Trim verbose mmio comments
Comments across the mmio lowering and its lit cases had grown into
multi-paragraph doc blocks. Compress each to one or two lines —
the load-bearing "why" stays, but the prose explanations move to
the PR description.
- AIRToAIEPass.cpp: shrink the launch-hoist note, the IsolatedFromAbove
mirror rationale, the V1 limitation reject, the i32-only repack
preamble + TODO, and the post-blockwrite cleanup note. Helper
docstring on `repackAsI32Bytes` also tightened.
- air_channel_mmio.mlir: drop the bullet-list file header and the
multi-line preambles on each split (simple/mixed/bcast/indexed/
bf16/bf16ns/i8/i16). Each case now reads as a single sentence
above its CHECK lines.
- air_channel_mmio_invalid.mlir: same treatment for the negative cases.
No functional change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Stamp mmio source onto destination buffer's initial_value, drop blockwrite
The runtime-sequence blockwrite path for channel_type="mmio" had a
host↔core race that the V1 design didn't address: aiex.configure
enables the cores during CDO load, before the runtime sequence's
blockwrites run. A core that reads its mmio destination buffer before
any lock-gated DMA acquire races the host writes — the existing
mmio_simple lit test only happens to dodge this because the core's
first action is a lock acquire on a shim DMA that the runtime sequence
fires after the blockwrite, by which time the blockwrite has long
completed. Without that lucky ordering (or in any non-trivial example
that reads the mmio buffer before the first DMA), the data read is
undefined.
Stamp the source memref.global's initializer onto the destination L1
aie.buffer's initial_value attribute instead. AIERTControl::initBuffers
already loads buffer initial_values into the tile via
XAie_DataMemBlockWrite at device-init time — before any core starts —
which makes the data delivery race-free relative to core execution.
Side benefits:
* Obsoletes the i32 repack hack (and its sub-byte / non-multiple-of-4
guards): XAie_DataMemBlockWrite handles APInt and APFloat element
types natively, so the destination buffer can carry its native
bf16 / i8 / i16 / etc. initializer.
* Drops the module-level mirror, get_global hoisting, and the
symbol-dce dance in aircc that protected against orphan-global
collisions in LLVM lowering — none of those exist anymore.
* Simpler V1 invariants: source/dest must agree on element type and
count (the buffer's natural shape constrains the initializer); a
given destination buffer can have at most one mmio source.
Net -184 lines. All 379 existing check-air-mlir tests still pass.
On NPU2 hardware, the existing mmio_simple programming example still
PASSes, and an [NKV=8, NS=1] decode-attention prototype with Q
delivered via mmio (mocking cascade-Q from #1565) PASSes with
correlation 0.999655 against the NumPy reference — without any
defer-the-read workaround.
The "variable-data MMIO via bo_instr patching" V2 plan (host-loaded
data, separate from compile-time constants) is unaffected: when it
arrives it can re-introduce blockwrite with a proper sync mechanism
(tile lock that the host releases after blockwrite completion).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent afd381f commit f9e8a6e
5 files changed
Lines changed: 293 additions & 179 deletions
File tree
- mlir
- lib
- Conversion
- Transform
- test/Conversion/AIRToAIE
- tools/aircc
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2536 | 2536 | | |
2537 | 2537 | | |
2538 | 2538 | | |
| 2539 | + | |
| 2540 | + | |
| 2541 | + | |
| 2542 | + | |
| 2543 | + | |
| 2544 | + | |
| 2545 | + | |
| 2546 | + | |
| 2547 | + | |
2539 | 2548 | | |
2540 | 2549 | | |
2541 | 2550 | | |
| |||
2950 | 2959 | | |
2951 | 2960 | | |
2952 | 2961 | | |
2953 | | - | |
2954 | | - | |
2955 | | - | |
2956 | | - | |
| 2962 | + | |
2957 | 2963 | | |
2958 | 2964 | | |
2959 | | - | |
2960 | | - | |
2961 | 2965 | | |
2962 | 2966 | | |
2963 | 2967 | | |
| |||
5809 | 5813 | | |
5810 | 5814 | | |
5811 | 5815 | | |
5812 | | - | |
5813 | | - | |
| 5816 | + | |
| 5817 | + | |
| 5818 | + | |
5814 | 5819 | | |
5815 | 5820 | | |
5816 | 5821 | | |
| 5822 | + | |
| 5823 | + | |
| 5824 | + | |
| 5825 | + | |
| 5826 | + | |
| 5827 | + | |
| 5828 | + | |
| 5829 | + | |
5817 | 5830 | | |
5818 | 5831 | | |
5819 | 5832 | | |
| |||
5822 | 5835 | | |
5823 | 5836 | | |
5824 | 5837 | | |
| 5838 | + | |
| 5839 | + | |
| 5840 | + | |
| 5841 | + | |
| 5842 | + | |
| 5843 | + | |
| 5844 | + | |
| 5845 | + | |
| 5846 | + | |
| 5847 | + | |
| 5848 | + | |
| 5849 | + | |
| 5850 | + | |
| 5851 | + | |
| 5852 | + | |
| 5853 | + | |
| 5854 | + | |
| 5855 | + | |
| 5856 | + | |
| 5857 | + | |
5825 | 5858 | | |
5826 | 5859 | | |
5827 | 5860 | | |
| |||
5832 | 5865 | | |
5833 | 5866 | | |
5834 | 5867 | | |
5835 | | - | |
5836 | | - | |
| 5868 | + | |
| 5869 | + | |
| 5870 | + | |
| 5871 | + | |
| 5872 | + | |
| 5873 | + | |
5837 | 5874 | | |
5838 | | - | |
5839 | | - | |
5840 | | - | |
5841 | | - | |
5842 | | - | |
5843 | | - | |
5844 | | - | |
5845 | | - | |
5846 | | - | |
5847 | | - | |
5848 | | - | |
5849 | | - | |
5850 | | - | |
5851 | | - | |
5852 | | - | |
5853 | | - | |
5854 | | - | |
5855 | | - | |
5856 | | - | |
5857 | | - | |
5858 | | - | |
5859 | | - | |
5860 | | - | |
5861 | | - | |
5862 | | - | |
5863 | | - | |
5864 | | - | |
5865 | | - | |
5866 | | - | |
5867 | | - | |
5868 | | - | |
5869 | | - | |
5870 | | - | |
5871 | | - | |
5872 | | - | |
5873 | | - | |
5874 | | - | |
5875 | | - | |
5876 | | - | |
5877 | | - | |
5878 | | - | |
5879 | | - | |
5880 | | - | |
5881 | | - | |
5882 | | - | |
5883 | | - | |
5884 | | - | |
5885 | | - | |
5886 | | - | |
5887 | | - | |
5888 | | - | |
5889 | | - | |
5890 | | - | |
5891 | | - | |
5892 | | - | |
5893 | | - | |
5894 | | - | |
5895 | | - | |
5896 | | - | |
5897 | | - | |
5898 | | - | |
5899 | | - | |
5900 | | - | |
5901 | | - | |
5902 | | - | |
5903 | | - | |
5904 | | - | |
5905 | | - | |
5906 | | - | |
5907 | | - | |
5908 | | - | |
5909 | | - | |
5910 | | - | |
5911 | | - | |
5912 | | - | |
5913 | | - | |
5914 | | - | |
5915 | | - | |
5916 | | - | |
5917 | | - | |
5918 | | - | |
5919 | | - | |
5920 | | - | |
5921 | | - | |
5922 | | - | |
5923 | | - | |
5924 | | - | |
5925 | | - | |
5926 | | - | |
5927 | | - | |
5928 | | - | |
5929 | | - | |
5930 | | - | |
| 5875 | + | |
| 5876 | + | |
| 5877 | + | |
| 5878 | + | |
| 5879 | + | |
| 5880 | + | |
| 5881 | + | |
| 5882 | + | |
| 5883 | + | |
| 5884 | + | |
| 5885 | + | |
| 5886 | + | |
| 5887 | + | |
| 5888 | + | |
| 5889 | + | |
| 5890 | + | |
| 5891 | + | |
| 5892 | + | |
| 5893 | + | |
| 5894 | + | |
| 5895 | + | |
| 5896 | + | |
5931 | 5897 | | |
5932 | 5898 | | |
5933 | | - | |
5934 | 5899 | | |
5935 | 5900 | | |
5936 | 5901 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1603 | 1603 | | |
1604 | 1604 | | |
1605 | 1605 | | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
1606 | 1612 | | |
1607 | 1613 | | |
1608 | 1614 | | |
| |||
0 commit comments