Skip to content

Add ELF output format support for NPU2 (AIE2P)#27

Merged
erwei-xilinx merged 1 commit into
mainfrom
elf-output-format
Mar 26, 2026
Merged

Add ELF output format support for NPU2 (AIE2P)#27
erwei-xilinx merged 1 commit into
mainfrom
elf-output-format

Conversation

@erwei-xilinx

@erwei-xilinx erwei-xilinx commented Mar 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add ELF output format as default for NPU2 (AIE2P/Strix), replacing the two-file xclbin+insts.bin workflow with a single self-contained aie.elf binary
  • NPU1 (AIE2) continues using xclbin unchanged — no behavioral change for legacy devices
  • New AMD_TRITON_NPU_OUTPUT_FORMAT env var allows overriding the auto-detected default ("elf" on npu2, "xclbin" on npu1)

What changed in driver.py

  • _get_output_format(): auto-detects format based on NPU version, with env var override and npu1+elf validation
  • _generate_elf_launcher(): C++ launcher template using xrt::elf, xrt::ext::kernel, xrt::ext::bo, and xrt::run APIs
  • _extract_elf_kernel_name(): parses full_elf_config.json to discover the kernel name (e.g., main:vecadd) since ELF uses IR-derived names instead of MLIR_AIE
  • compile_module(): conditional linker flags (ELF doesn't need test_utils/boost), aircc invocation (--output-format elf --elf-name), caching (2 files vs 3), and module loading
  • NPULauncher.__init__(): dispatches to correct launcher generator based on format

Test plan

  • Vec-add on NPU2 with ELF (default) — PASS
  • Full test suite on NPU2 (14/15 pass, matvec fails in both ELF and xclbin — pre-existing)
  • Matvec with AMD_TRITON_NPU_OUTPUT_FORMAT=xclbin on NPU2 — same failure (confirms pre-existing)
  • NPU1 regression test (xclbin path unchanged)
  • AMD_TRITON_NPU_OUTPUT_FORMAT=elf on NPU1 — should raise RuntimeError

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 25, 2026 22:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds ELF output support for NPU2 (AIE2P/Strix) to make aie.elf the default binary output, while keeping the existing xclbin workflow for NPU1 (AIE2). It also introduces an environment-variable override to force either format.

Changes:

  • Add output-format auto-detection with AMD_TRITON_NPU_OUTPUT_FORMAT override (default: ELF on npu2, xclbin on npu1).
  • Add ELF-specific C++ launcher generation and switch aircc invocation/caching to produce and load aie.elf (+ kernel name metadata).
  • Add a new CLAUDE.md development guide covering repo structure and compilation pipeline.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
amd_triton_npu/backend/driver.py Implements format selection, ELF launcher path, aircc ELF build flags, and format-aware caching/loading.
CLAUDE.md Adds a large development guide describing the project, build, CI, and compilation pipeline (including ELF on npu2).
Comments suppressed due to low confidence (1)

amd_triton_npu/backend/driver.py:10

  • tempfile is imported twice (import tempfile and again in import os, subprocess, tempfile, platform). This is redundant and can confuse linting/formatting; remove the duplicate import and keep a single style for imports.
import hashlib
import json
import tempfile
import sys
import sysconfig

import os, subprocess, tempfile, platform

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread amd_triton_npu/backend/driver.py Outdated
Comment thread amd_triton_npu/backend/driver.py Outdated
Comment thread amd_triton_npu/backend/driver.py Outdated
Comment thread CLAUDE.md Outdated
ELF is a newer, self-contained XRT binary format that replaces the
two-file xclbin+insts.bin workflow. It embeds instructions via PDI
and simplifies deployment. ELF is only supported on NPU2 (AIE2P/Strix).

NPU2 now defaults to ELF format. NPU1 (AIE2) continues to use xclbin
unchanged. The format can be overridden via AMD_TRITON_NPU_OUTPUT_FORMAT.

Key changes to driver.py:
- Add _get_output_format() for auto-detection and env var override
- Add _generate_elf_launcher() using xrt::elf, xrt::ext::kernel,
  xrt::ext::bo, and xrt::run APIs (no instruction binary needed)
- Add _extract_elf_kernel_name() to read kernel name from ELF config
- Modify compile_module() for conditional linker flags, aircc flags
  (--output-format elf --elf-name), caching (2 vs 3 files), and
  module loading
- Modify NPULauncher to dispatch to correct launcher generator

Tested: 14/15 examples pass on NPU2 with ELF (matvec fails in both
ELF and xclbin modes -- pre-existing issue).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@erwei-xilinx erwei-xilinx merged commit d491a48 into main Mar 26, 2026
8 of 9 checks passed
@erwei-xilinx erwei-xilinx deleted the elf-output-format branch March 26, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants