Skip to content

[AIE2] Add instruction selection for v8i64 G_AIE_BROADCAST_VECTOR#964

Merged
konstantinschwarz merged 3 commits into
Xilinx:aie-publicfrom
erwei-xilinx:erwei/fix-aie2-acc512-broadcast-vector
Apr 30, 2026
Merged

[AIE2] Add instruction selection for v8i64 G_AIE_BROADCAST_VECTOR#964
konstantinschwarz merged 3 commits into
Xilinx:aie-publicfrom
erwei-xilinx:erwei/fix-aie2-acc512-broadcast-vector

Conversation

@erwei-xilinx
Copy link
Copy Markdown
Contributor

Summary

The AIE2 backend was missing instruction selection patterns for G_AIE_BROADCAST_VECTOR with v8i64 (acc512) destinations, causing a fatal "cannot select" error when LLVM opt -O1+ folded store zeroinitializer + load into a direct <8 x i64> zeroinitializer constant used as an accumulator argument to bf.mac16.conf.

The AIE2P backend already handles this (line 860 of AIE2PInstrPatterns.td) via a TableGen pattern. On AIE2, VBCST_64 outputs to mXm (VEC512), not ACC512, so the same pattern produces a TableGen error.

Changes

  • AIE2InstrPatterns.td: Relax SDTypeProfile from [SDTCisVT<1, i32>] to [] (matching AIE2P). Add vector-source (v8i8/v4i16/v2i32) patterns for VBCST_64.
  • AIE2InstructionSelector.cpp: Add custom selectG_AIE_BROADCAST_VECTOR that emits VBCST_64 (VEC512) + VMOV_mv_x (copy to ACC512) for v8i64 destinations. Handles both zero and non-zero broadcasts.
  • New test: inst-select-broadcast-vector-acc512.mir — verifies v8i64 and v16i32 broadcast vector selection.

Test plan

  • AIE2 CodeGen tests: 782 pass / 9 fail (same 9 pre-existing failures, verified by running unpatched)
  • E2E: cascade GEMV bf16 on NPU1 at -O3 (no workaround needed): PASS

🤖 Generated with Claude Code

The AIE2 backend was missing instruction selection patterns for
G_AIE_BROADCAST_VECTOR with v8i64 (acc512) destinations. This caused
an "cannot select" fatal error when LLVM opt folded a store/load of
zeroinitializer into a direct <8 x i64> zeroinitializer constant used
as an accumulator argument to bf.mac16.conf.

The AIE2P backend already handles this case (AIE2PInstrPatterns.td:860)
via a TableGen pattern mapping to VBCST_64. However, on AIE2, VBCST_64
outputs to mXm (VEC512 register class), not ACC512, so the same
TableGen approach produces a "Type set is empty" error.

Fix by adding custom instruction selection in
AIE2InstructionSelector::selectG_AIE_BROADCAST_VECTOR that:
1. Emits VBCST_64 into a VEC512 temporary register
2. Copies to the ACC512 destination via VMOV_mv_x

This handles both zero and non-zero v8i64 broadcasts correctly.

Also relaxes the SDTypeProfile constraint on bcst_vector_node from
[SDTCisVT<1, i32>] to [] (matching AIE2P) to support vector source
operands, and adds vector-source patterns for v64i8, v32i16, v16i32
destinations with VBCST_64.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erwei-xilinx erwei-xilinx marked this pull request as ready for review April 23, 2026 04:54
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request Apr 23, 2026
- Add NPU1 LIT tests: 1-col/2-cascade, 2-col/4-cascade, 4-col/4-cascade
- Split NPU2 LIT tests into separate files per config
- Fix broken profile target in Makefile (missing mkdir, merged lines)

Depends on Xilinx/llvm-aie#964 for the AIE2 G_AIE_BROADCAST_VECTOR
instruction selection fix that eliminates the llc crash at -O3.

Verified on NPU1 hardware with patched Peano: all three configs PASS
at full -O3 without any opt-level workaround.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erwei-xilinx
Copy link
Copy Markdown
Contributor Author

Hi team, could you please kindly provide review and feedback? This PR should only affect NPU1 and only extends on the vectorized zero fill. @konstantinschwarz @stephenneuendorffer

Comment thread llvm/lib/Target/AIE/AIE2InstrPatterns.td Outdated
Comment thread llvm/lib/Target/AIE/AIE2InstructionSelector.cpp Outdated
erwei-xilinx and others added 2 commits April 27, 2026 11:22
Replace the custom selectG_AIE_BROADCAST_VECTOR C++ code with a TableGen
COPY_TO_REGCLASS pattern, following the established pattern used by
vinsert32_accfloat (line 685). This emits a COPY instead of an explicit
VMOV_mv_x, letting the register allocator resolve the cross-bank move.

Addresses review feedback on PR Xilinx#964.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erwei-xilinx
Copy link
Copy Markdown
Contributor Author

Gentle pin. Is this PR good to merge?

@konstantinschwarz
Copy link
Copy Markdown
Collaborator

Yes, I approved couple days ago - I thought you'd press the merge button :)

@konstantinschwarz konstantinschwarz merged commit b56cb88 into Xilinx:aie-public Apr 30, 2026
7 checks passed
@erwei-xilinx
Copy link
Copy Markdown
Contributor Author

@konstantinschwarz thanks for the kind review and approval! I tried to merge myself but do not have the option to merge. Thanks for merging for me.

@erwei-xilinx erwei-xilinx deleted the erwei/fix-aie2-acc512-broadcast-vector branch April 30, 2026 20:11
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request May 2, 2026
* Add NPU1 support for cascade GEMV bf16 example

- Add NPU1 LIT tests: 1-col/2-cascade, 2-col/4-cascade, 4-col/4-cascade
- Split NPU2 LIT tests into separate files per config
- Fix broken profile target in Makefile (missing mkdir, merged lines)

Depends on Xilinx/llvm-aie#964 for the AIE2 G_AIE_BROADCAST_VECTOR
instruction selection fix that eliminates the llc crash at -O3.

Verified on NPU1 hardware with patched Peano: all three configs PASS
at full -O3 without any opt-level workaround.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Address Copilot review: dedupe NPU2 LIT tests and rename to consistent pattern

- Delete duplicate run_npu2_8col_peano.lit and run_npu2_2col_2cascade_peano.lit
- Rename existing run_npu2_8col.lit -> run_npu2_8col_4cascade_peano.lit
- Rename existing run_npu2_cascade2.lit -> run_npu2_2col_2cascade_peano.lit
- Rename run_npu2_makefile_peano.lit -> run_npu2_2col_4cascade_peano.lit
- Rename run_npu1_makefile_peano.lit -> run_npu1_1col_2cascade_peano.lit
- Update mkdir/cd work-dir paths inside renamed files

All NPU2 (and new NPU1) Peano LIT tests now follow the
run_<device>_<cols>col_<cascade>cascade_peano.lit naming convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants