[AIE2] Add instruction selection for v8i64 G_AIE_BROADCAST_VECTOR#964
Merged
konstantinschwarz merged 3 commits intoApr 30, 2026
Conversation
The AIE2 backend was missing instruction selection patterns for G_AIE_BROADCAST_VECTOR with v8i64 (acc512) destinations. This caused an "cannot select" fatal error when LLVM opt folded a store/load of zeroinitializer into a direct <8 x i64> zeroinitializer constant used as an accumulator argument to bf.mac16.conf. The AIE2P backend already handles this case (AIE2PInstrPatterns.td:860) via a TableGen pattern mapping to VBCST_64. However, on AIE2, VBCST_64 outputs to mXm (VEC512 register class), not ACC512, so the same TableGen approach produces a "Type set is empty" error. Fix by adding custom instruction selection in AIE2InstructionSelector::selectG_AIE_BROADCAST_VECTOR that: 1. Emits VBCST_64 into a VEC512 temporary register 2. Copies to the ACC512 destination via VMOV_mv_x This handles both zero and non-zero v8i64 broadcasts correctly. Also relaxes the SDTypeProfile constraint on bcst_vector_node from [SDTCisVT<1, i32>] to [] (matching AIE2P) to support vector source operands, and adds vector-source patterns for v64i8, v32i16, v16i32 destinations with VBCST_64. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
Apr 23, 2026
- Add NPU1 LIT tests: 1-col/2-cascade, 2-col/4-cascade, 4-col/4-cascade - Split NPU2 LIT tests into separate files per config - Fix broken profile target in Makefile (missing mkdir, merged lines) Depends on Xilinx/llvm-aie#964 for the AIE2 G_AIE_BROADCAST_VECTOR instruction selection fix that eliminates the llc crash at -O3. Verified on NPU1 hardware with patched Peano: all three configs PASS at full -O3 without any opt-level workaround. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
Contributor
Author
|
Hi team, could you please kindly provide review and feedback? This PR should only affect NPU1 and only extends on the vectorized zero fill. @konstantinschwarz @stephenneuendorffer |
konstantinschwarz
approved these changes
Apr 27, 2026
Replace the custom selectG_AIE_BROADCAST_VECTOR C++ code with a TableGen COPY_TO_REGCLASS pattern, following the established pattern used by vinsert32_accfloat (line 685). This emits a COPY instead of an explicit VMOV_mv_x, letting the register allocator resolve the cross-bank move. Addresses review feedback on PR Xilinx#964. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Gentle pin. Is this PR good to merge? |
Collaborator
|
Yes, I approved couple days ago - I thought you'd press the merge button :) |
Contributor
Author
|
@konstantinschwarz thanks for the kind review and approval! I tried to merge myself but do not have the option to merge. Thanks for merging for me. |
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 2, 2026
* Add NPU1 support for cascade GEMV bf16 example - Add NPU1 LIT tests: 1-col/2-cascade, 2-col/4-cascade, 4-col/4-cascade - Split NPU2 LIT tests into separate files per config - Fix broken profile target in Makefile (missing mkdir, merged lines) Depends on Xilinx/llvm-aie#964 for the AIE2 G_AIE_BROADCAST_VECTOR instruction selection fix that eliminates the llc crash at -O3. Verified on NPU1 hardware with patched Peano: all three configs PASS at full -O3 without any opt-level workaround. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Address Copilot review: dedupe NPU2 LIT tests and rename to consistent pattern - Delete duplicate run_npu2_8col_peano.lit and run_npu2_2col_2cascade_peano.lit - Rename existing run_npu2_8col.lit -> run_npu2_8col_4cascade_peano.lit - Rename existing run_npu2_cascade2.lit -> run_npu2_2col_2cascade_peano.lit - Rename run_npu2_makefile_peano.lit -> run_npu2_2col_4cascade_peano.lit - Rename run_npu1_makefile_peano.lit -> run_npu1_1col_2cascade_peano.lit - Update mkdir/cd work-dir paths inside renamed files All NPU2 (and new NPU1) Peano LIT tests now follow the run_<device>_<cols>col_<cascade>cascade_peano.lit naming convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The AIE2 backend was missing instruction selection patterns for
G_AIE_BROADCAST_VECTORwithv8i64(acc512) destinations, causing a fatal "cannot select" error when LLVMopt -O1+foldedstore zeroinitializer + loadinto a direct<8 x i64> zeroinitializerconstant used as an accumulator argument tobf.mac16.conf.The AIE2P backend already handles this (line 860 of
AIE2PInstrPatterns.td) via a TableGen pattern. On AIE2,VBCST_64outputs tomXm(VEC512), not ACC512, so the same pattern produces a TableGen error.Changes
AIE2InstrPatterns.td: RelaxSDTypeProfilefrom[SDTCisVT<1, i32>]to[](matching AIE2P). Add vector-source (v8i8/v4i16/v2i32) patterns forVBCST_64.AIE2InstructionSelector.cpp: Add customselectG_AIE_BROADCAST_VECTORthat emitsVBCST_64(VEC512) +VMOV_mv_x(copy to ACC512) forv8i64destinations. Handles both zero and non-zero broadcasts.inst-select-broadcast-vector-acc512.mir— verifiesv8i64andv16i32broadcast vector selection.Test plan
-O3(no workaround needed): PASS🤖 Generated with Claude Code