Skip to content

Commit 911ec57

Browse files
erwei-xilinxclaudeCopilot
authored
Add NPU1 support for cascade GEMV bf16 example (Xilinx#1552)
* Add NPU1 support for cascade GEMV bf16 example - Add NPU1 LIT tests: 1-col/2-cascade, 2-col/4-cascade, 4-col/4-cascade - Split NPU2 LIT tests into separate files per config - Fix broken profile target in Makefile (missing mkdir, merged lines) Depends on Xilinx/llvm-aie#964 for the AIE2 G_AIE_BROADCAST_VECTOR instruction selection fix that eliminates the llc crash at -O3. Verified on NPU1 hardware with patched Peano: all three configs PASS at full -O3 without any opt-level workaround. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Address Copilot review: dedupe NPU2 LIT tests and rename to consistent pattern - Delete duplicate run_npu2_8col_peano.lit and run_npu2_2col_2cascade_peano.lit - Rename existing run_npu2_8col.lit -> run_npu2_8col_4cascade_peano.lit - Rename existing run_npu2_cascade2.lit -> run_npu2_2col_2cascade_peano.lit - Rename run_npu2_makefile_peano.lit -> run_npu2_2col_4cascade_peano.lit - Rename run_npu1_makefile_peano.lit -> run_npu1_1col_2cascade_peano.lit - Update mkdir/cd work-dir paths inside renamed files All NPU2 (and new NPU1) Peano LIT tests now follow the run_<device>_<cols>col_<cascade>cascade_peano.lit naming convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
1 parent d39411a commit 911ec57

7 files changed

Lines changed: 37 additions & 9 deletions

File tree

programming_examples/matrix_vector_multiplication/bf16_cascade/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@ all: run
3131

3232
print:
3333
${powershell} python3 ${srcdir}/matvec_cascade.py $(OUTPUT_FORMAT_FLAG) -p --m $(M) --k $(K) --tile-m $(TILE_M) --m-input $(M_INPUT) --herd-cols $(HERD_COLS) --n-cascade $(N_CASCADE)
34-
3534
run:
3635
mkdir -p $(BUILD_DIR)
3736
PEANO_INSTALL_DIR=$(PEANO_INSTALL_DIR) cd $(BUILD_DIR) && ${powershell} python3 ${srcdir}/matvec_cascade.py $(OUTPUT_FORMAT_FLAG) --m $(M) --k $(K) --tile-m $(TILE_M) --m-input $(M_INPUT) --herd-cols $(HERD_COLS) --n-cascade $(N_CASCADE) $(if $(DEBUG_IR),--debug-ir) || \
@@ -44,6 +43,7 @@ run:
4443
false)
4544

4645
profile: build-test-exe
46+
mkdir -p $(BUILD_DIR)
4747
PEANO_INSTALL_DIR=$(PEANO_INSTALL_DIR) cd $(BUILD_DIR) && python3 ${srcdir}/matvec_cascade.py $(OUTPUT_FORMAT_FLAG) --compile-mode compile-and-xclbin --m $(M) --k $(K) --tile-m $(TILE_M) --m-input $(M_INPUT) --herd-cols $(HERD_COLS) --n-cascade $(N_CASCADE)
4848
PEANO_INSTALL_DIR=$(PEANO_INSTALL_DIR) cd $(BUILD_DIR) && ./test.exe -x air.xclbin -k MLIR_AIE -i air.insts.bin -M $(M) -K $(K)
4949

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// (c) Copyright 2026 Advanced Micro Devices, Inc.
2+
// SPDX-License-Identifier: MIT
3+
//
4+
// REQUIRES: ryzen_ai_npu1, peano
5+
//
6+
// RUN: mkdir -p test_npu1_1col_2cascade_peano
7+
// RUN: cd test_npu1_1col_2cascade_peano
8+
// RUN: make -f %S/Makefile clean
9+
// RUN: make -f %S/Makefile run M=128 K=128 TILE_M=2 M_INPUT=1 HERD_COLS=1 N_CASCADE=2 PEANO_INSTALL_DIR=%PEANO_INSTALL_DIR | FileCheck %s
10+
// CHECK: PASS!
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// (c) Copyright 2026 Advanced Micro Devices, Inc.
2+
// SPDX-License-Identifier: MIT
3+
//
4+
// REQUIRES: ryzen_ai_npu1, peano
5+
//
6+
// RUN: mkdir -p test_npu1_2col_4cascade
7+
// RUN: cd test_npu1_2col_4cascade
8+
// RUN: make -f %S/Makefile clean
9+
// RUN: make -f %S/Makefile run M=256 K=512 TILE_M=4 M_INPUT=1 HERD_COLS=2 N_CASCADE=4 PEANO_INSTALL_DIR=%PEANO_INSTALL_DIR | FileCheck %s
10+
// CHECK: PASS!
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// (c) Copyright 2026 Advanced Micro Devices, Inc.
2+
// SPDX-License-Identifier: MIT
3+
//
4+
// REQUIRES: ryzen_ai_npu1, peano
5+
//
6+
// RUN: mkdir -p test_npu1_4col_4cascade
7+
// RUN: cd test_npu1_4col_4cascade
8+
// RUN: make -f %S/Makefile clean
9+
// RUN: make -f %S/Makefile run M=2048 K=8192 TILE_M=2 M_INPUT=1 HERD_COLS=4 N_CASCADE=4 PEANO_INSTALL_DIR=%PEANO_INSTALL_DIR | FileCheck %s
10+
// CHECK: PASS!

programming_examples/matrix_vector_multiplication/bf16_cascade/run_npu2_cascade2.lit renamed to programming_examples/matrix_vector_multiplication/bf16_cascade/run_npu2_2col_2cascade_peano.lit

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
//
44
// REQUIRES: ryzen_ai_npu2, peano
55
//
6-
// RUN: mkdir -p test_npu2_cascade2
7-
// RUN: cd test_npu2_cascade2
6+
// RUN: mkdir -p test_npu2_2col_2cascade_peano
7+
// RUN: cd test_npu2_2col_2cascade_peano
88
// RUN: make -f %S/Makefile clean
99
//
1010
// Correctness: M=256, K=512, 2 columns x 2 cascade tiles (n_cascade=2 boundary)

programming_examples/matrix_vector_multiplication/bf16_cascade/run_npu2_makefile_peano.lit renamed to programming_examples/matrix_vector_multiplication/bf16_cascade/run_npu2_2col_4cascade_peano.lit

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,8 @@
33
//
44
// REQUIRES: ryzen_ai_npu2, peano
55
//
6-
// RUN: mkdir -p test_npu2_peano
7-
// RUN: cd test_npu2_peano
6+
// RUN: mkdir -p test_npu2_2col_4cascade_peano
7+
// RUN: cd test_npu2_2col_4cascade_peano
88
// RUN: make -f %S/Makefile clean
9-
//
10-
// Correctness: M=256, K=512, 2 columns x 4 cascade tiles (quick sanity)
119
// RUN: make -f %S/Makefile run M=256 K=512 TILE_M=4 M_INPUT=1 HERD_COLS=2 N_CASCADE=4 PEANO_INSTALL_DIR=%PEANO_INSTALL_DIR | FileCheck %s
1210
// CHECK: PASS!

programming_examples/matrix_vector_multiplication/bf16_cascade/run_npu2_8col.lit renamed to programming_examples/matrix_vector_multiplication/bf16_cascade/run_npu2_8col_4cascade_peano.lit

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
//
44
// REQUIRES: ryzen_ai_npu2, peano
55
//
6-
// RUN: mkdir -p test_npu2_8col
7-
// RUN: cd test_npu2_8col
6+
// RUN: mkdir -p test_npu2_8col_4cascade_peano
7+
// RUN: cd test_npu2_8col_4cascade_peano
88
// RUN: make -f %S/Makefile clean
99
//
1010
// Correctness: M=2048, K=8192, 8 columns x 4 cascade tiles (tile_m=2 for L2 fit)

0 commit comments

Comments
 (0)