Summary
Tested all examples from PR #48 (multi-dtype support) on NPU1 (Phoenix / AIE2) using transform_aie2.mlir. The PR was developed and validated on NPU2 (Strix/AIE2P). This issue documents NPU1 compatibility.
Environment:
- Device: NPU Phoenix (
HP EliteBook 845 14 inch G10, Ryzen 7 PRO 7840U)
- XRT: 2.23.0
- NPU Firmware: 1.5.5.391
- mlir-air: 0.0.1.2026032022+f954272
- mlir-aie: 0.0.1.2026040605+e368f3e
- llvm-aie (Peano): 19.0.0.2025071101+b3cd09d3
Results: 15/20 PASS
Passing tests (15)
| Example |
bf16 |
f32 |
i16 |
| vec-add |
PASS |
PASS |
PASS |
| axpy |
PASS |
PASS |
PASS |
| relu |
PASS |
PASS |
PASS |
| leaky_relu |
PASS |
— |
N/A |
| sigmoid |
PASS |
— |
N/A |
| silu |
PASS |
PASS |
N/A |
| swiglu |
PASS |
PASS |
N/A |
Note: gelu has no transform_aie2.mlir (NPU2-only), so it was not tested.
Failing tests (5)
1. i8 failures — Peano LLC error (vec-add/i8, axpy/i8, relu/i8)
All three i8 tests compile through MLIR transforms successfully but fail at the Peano LLC backend:
Error running Peano llc
Compilation failed
The installed Peano (llvm-aie 19.0.0.2025071101) cannot lower i8 vector operations to AIE2 machine code. This is consistent with the PR's own notes about i8 limitations on AIE2P (arith.muli/maxsi not supported for i8 vectors).
2. leaky_relu/f32 — Peano LLC error
The leaky_relu kernel uses tl.where(x >= 0, x, 0.01 * x) which generates arith.cmpf + arith.select. With f32 input + bf16-emulation, Peano LLC cannot lower this combination on AIE2.
3. sigmoid/f32 — Transform padding type mismatch
'transform.structured.pad' op expects a padding value of type 'f32', got 0.000000e+00 : bf16
The sigmoid/transform_aie2.mlir hardcodes a bf16 padding value. When f32 input is used (with bf16-emulation), the pad value type mismatches. PR #48 updated transform_aie2p.mlir for sigmoid but did not update transform_aie2.mlir for this case. This is a fixable bug — the AIE2 transform script needs @DTYPE@ placeholder support for the padding value, or a dtype-aware pad constant.
Recommendations
Ref: #48
Summary
Tested all examples from PR #48 (multi-dtype support) on NPU1 (Phoenix / AIE2) using
transform_aie2.mlir. The PR was developed and validated on NPU2 (Strix/AIE2P). This issue documents NPU1 compatibility.Environment:
HP EliteBook 845 14 inch G10, Ryzen 7 PRO 7840U)Results: 15/20 PASS
Passing tests (15)
Note: gelu has no
transform_aie2.mlir(NPU2-only), so it was not tested.Failing tests (5)
1. i8 failures — Peano LLC error (vec-add/i8, axpy/i8, relu/i8)
All three i8 tests compile through MLIR transforms successfully but fail at the Peano LLC backend:
The installed Peano (llvm-aie 19.0.0.2025071101) cannot lower i8 vector operations to AIE2 machine code. This is consistent with the PR's own notes about i8 limitations on AIE2P (arith.muli/maxsi not supported for i8 vectors).
2. leaky_relu/f32 — Peano LLC error
The
leaky_relukernel usestl.where(x >= 0, x, 0.01 * x)which generatesarith.cmpf + arith.select. With f32 input + bf16-emulation, Peano LLC cannot lower this combination on AIE2.3. sigmoid/f32 — Transform padding type mismatch
The
sigmoid/transform_aie2.mlirhardcodes a bf16 padding value. When f32 input is used (with bf16-emulation), the pad value type mismatches. PR #48 updatedtransform_aie2p.mlirfor sigmoid but did not updatetransform_aie2.mlirfor this case. This is a fixable bug — the AIE2 transform script needs@DTYPE@placeholder support for the padding value, or a dtype-aware pad constant.Recommendations
Ref: #48