Skip to content

Commit 4a393ff

Browse files
committed
[Docs] Add Ascend NPU installation guide for FlagTree
1 parent 4d9e18e commit 4a393ff

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
# How to Install FlagTree Triton Compiler on Huawei Ascend NPU: A Step-by-Step Guide
2+
3+
> This tutorial is based on real hands-on experience, documenting the full process of building FlagTree from source on an openEuler + Ascend910 environment — including 4 failed build attempts and how we debugged them. Hopefully this saves you some headaches.
4+
5+
## 1. What Is FlagTree?
6+
7+
FlagTree is a **multi-backend unified Triton compiler** developed by the FlagOS team, forked from triton-lang/triton. It supports multiple AI chip backends including NVIDIA, Huawei Ascend, Hygon DCU, Moore Threads, and more.
8+
9+
In short: **if you want to write Triton kernels on Ascend NPU, FlagTree is the compiler you need.**
10+
11+
Project repository: `github.com/flagos-ai/flagtree`
12+
13+
## 2. Environment
14+
15+
Here is the environment we are working with:
16+
17+
| Item | Version |
18+
|------|---------|
19+
| OS | openEuler 2203sp4, Linux 5.10.0 aarch64 |
20+
| Python | 3.9.9 |
21+
| Device | Ascend910_9382 (16 NPUs) |
22+
| CANN | 8.5.0 |
23+
| PyTorch | 2.8.0+cpu |
24+
| torch_npu | 2.8.0.post2 |
25+
| GCC | 10.3.1 |
26+
27+
> **Note**: This is an **aarch64** machine, not x86_64. This matters because many pre-built toolchains are x86-only — you need the aarch64 versions.
28+
29+
## 3. Installation Steps
30+
31+
### Step 1: Clone the FlagTree Repository
32+
33+
```bash
34+
git clone https://github.com/flagos-ai/flagtree.git ~/FlagTree
35+
cd ~/FlagTree
36+
git submodule update --init --recursive
37+
```
38+
39+
After cloning, verify that the third-party dependencies are in place:
40+
41+
```bash
42+
ls third_party/ascend/AscendNPU-IR/ # Ascend NPU IR submodule
43+
ls third_party/flir/ # FLIR (FlagTree Linalg IR) submodule
44+
```
45+
46+
If these directories are empty, the submodules were not pulled properly. Re-run `git submodule update --init`.
47+
48+
### Step 2: Check Build Dependencies
49+
50+
FlagTree requires the following build tools:
51+
52+
```bash
53+
cmake --version # >= 3.18 (we used 4.2.3)
54+
ninja --version # >= 1.11 (we used 1.13.0)
55+
pip install pybind11 # >= 2.13.1
56+
```
57+
58+
> If cmake or ninja are missing, `pip install cmake ninja` will do. The setup.py also auto-installs them into a temporary build environment.
59+
60+
### Step 3: Obtain Pre-built LLVM
61+
62+
This is the most critical step. FlagTree requires an LLVM toolchain with MLIR support to compile Triton.
63+
64+
For the Ascend backend, FlagTree provides a pre-built LLVM:
65+
66+
```bash
67+
# If you have internet access, setup.py will download it automatically.
68+
# If not, download and extract manually to ~/.flagtree/ascend/
69+
mkdir -p ~/.flagtree/ascend
70+
cd ~/.flagtree/ascend
71+
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/llvm-a66376b0-ubuntu-aarch64-python311-compat_v0.3.0.tar.gz
72+
tar xzf llvm-a66376b0-ubuntu-aarch64-python311-compat_v0.3.0.tar.gz
73+
```
74+
75+
After extraction, the directory structure looks like:
76+
77+
```
78+
~/.flagtree/ascend/llvm-a66376b0-ubuntu-aarch64-python311-compat/
79+
├── bin/ # clang, clang++, mlir-opt, etc.
80+
├── include/ # LLVM/MLIR headers
81+
└── lib/ # LLVM/MLIR static libraries + libstdc++.so.6.0.30
82+
```
83+
84+
> **Key point**: This pre-built LLVM ships with its own `libstdc++.so.6.0.30`. You will need this later.
85+
86+
### Step 4: Extract Build Dependencies
87+
88+
FlagTree ships a pre-packaged dependency tarball in its repository:
89+
90+
```bash
91+
cd ~
92+
tar xzf ~/FlagTree/build-deps-triton_3.2.x-linux-aarch64.tar.gz
93+
```
94+
95+
This extracts googletest, the JSON library, and other build dependencies into `~/.triton/`.
96+
97+
### Step 5: Set Environment Variables (Critical!)
98+
99+
This step is **the most error-prone part** of the entire process. It took us 4 attempts to get right. Here are the key lessons learned.
100+
101+
```bash
102+
# 1. Specify the backend as Ascend
103+
export FLAGTREE_BACKEND=ascend
104+
105+
# 2. Point to the pre-built LLVM
106+
export LLVM_SYSPATH=~/.flagtree/ascend/llvm-a66376b0-ubuntu-aarch64-python311-compat
107+
108+
# 3. Add LLVM's bin to PATH (CMake needs to find clang/clang++)
109+
export PATH=$LLVM_SYSPATH/bin:$PATH
110+
111+
# 4. [CRITICAL] Add LLVM's lib to the linker search path
112+
# The pre-built LLVM's static libraries require GLIBCXX_3.4.30,
113+
# but the system GCC 10 only provides GLIBCXX_3.4.28.
114+
# The LLVM bundle includes libstdc++.so.6.0.30 — the linker must find it.
115+
export LIBRARY_PATH=$LLVM_SYSPATH/lib:${LIBRARY_PATH:-}
116+
export LD_LIBRARY_PATH=$LLVM_SYSPATH/lib:${LD_LIBRARY_PATH:-}
117+
118+
# 5. Offline build (optional — prevents downloads during build)
119+
export TRITON_OFFLINE_BUILD=1
120+
121+
# 6. Disable Proton (profiling tool, not needed for Ascend)
122+
export TRITON_BUILD_PROTON=OFF
123+
124+
# 7. [CRITICAL] Append extra CMake arguments to fix two build issues:
125+
# - Disable -Werror (LLVM headers trigger dangling-assignment-gsl warnings)
126+
# - Tell the linker to search LLVM's lib directory
127+
export TRITON_APPEND_CMAKE_ARGS="-DLLVM_ENABLE_WERROR=OFF \
128+
-DCMAKE_CXX_FLAGS=-Wno-error=dangling-assignment-gsl \
129+
-DCMAKE_EXE_LINKER_FLAGS=-L$LLVM_SYSPATH/lib \
130+
-DCMAKE_SHARED_LINKER_FLAGS=-L$LLVM_SYSPATH/lib"
131+
132+
# 8. Limit parallel jobs (aarch64 machines have many cores but may lack memory)
133+
export MAX_JOBS=16
134+
```
135+
136+
### Step 6: Build and Install
137+
138+
Everything is ready. Start the build:
139+
140+
```bash
141+
cd ~/FlagTree/python
142+
143+
# Clean any previously failed build artifacts
144+
rm -rf build/
145+
146+
# Install in editable mode (convenient for development and debugging)
147+
pip install -e . -v 2>&1 | tee ~/flagtree_build.log
148+
```
149+
150+
The build takes approximately **10-20 minutes** (depending on `MAX_JOBS` and machine performance).
151+
152+
If everything goes well, you should see:
153+
154+
```
155+
Successfully installed flagtree-0.5.0+gitXXXXXXX
156+
```
157+
158+
### Step 7: Verify the Installation
159+
160+
```python
161+
import triton
162+
print(triton.__version__) # 3.2.0
163+
print(triton.__file__) # Should point to ~/FlagTree/python/triton/__init__.py
164+
```
165+
166+
Check that the Ascend backend is available:
167+
168+
```python
169+
from triton.backends.ascend import driver as ascend_driver
170+
print("Ascend backend loaded!")
171+
```
172+
173+
Run a simple kernel test:
174+
175+
```python
176+
import triton
177+
import triton.language as tl
178+
import torch
179+
import torch_npu
180+
181+
@triton.jit
182+
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
183+
pid = tl.program_id(axis=0)
184+
block_start = pid * BLOCK_SIZE
185+
offsets = block_start + tl.arange(0, BLOCK_SIZE)
186+
mask = offsets < n_elements
187+
x = tl.load(x_ptr + offsets, mask=mask)
188+
y = tl.load(y_ptr + offsets, mask=mask)
189+
output = x + y
190+
tl.store(output_ptr + offsets, output, mask=mask)
191+
192+
n = 1024
193+
x = torch.randn(n, dtype=torch.float32).to('npu')
194+
y = torch.randn(n, dtype=torch.float32).to('npu')
195+
out = torch.empty_like(x)
196+
197+
add_kernel[(1,)](x, y, out, n, BLOCK_SIZE=1024)
198+
199+
# Verify results on CPU (avoids CANN OPP issues)
200+
diff = torch.max(torch.abs(out.cpu() - (x.cpu() + y.cpu()))).item()
201+
print(f"Max diff: {diff}") # Should be 0.0
202+
```
203+
204+
## 4. Troubleshooting: Lessons from 4 Failed Builds
205+
206+
If you follow the steps above exactly, you should succeed on the first try. But if you are curious why those "weird" environment variables are necessary, here is the record of our 4 failed attempts:
207+
208+
### Pitfall 1: clang Not Found
209+
210+
**Symptom**: CMake error — `CMAKE_C_COMPILER: clang is not a full path and was not found in the PATH`
211+
212+
**Root cause**: `LLVM_SYSPATH` was set, but the LLVM `bin` directory was not added to `PATH`. The CMakeLists.txt hardcodes `set(CMAKE_C_COMPILER clang)`, which only searches by name in `PATH`.
213+
214+
**Fix**: `export PATH=$LLVM_SYSPATH/bin:$PATH`
215+
216+
### Pitfall 2: -Werror Causes Compilation Failure
217+
218+
**Symptom**:
219+
```
220+
mlir/IR/OperationSupport.h:1000:27: error: object backing the pointer
221+
will be destroyed [-Werror,-Wdangling-assignment-gsl]
222+
```
223+
224+
**Root cause**: FlagTree builds with `-Werror` by default (all warnings treated as errors). However, the pre-built LLVM headers trigger a `dangling-assignment-gsl` warning introduced in clang-21. This is not a code bug — the compiler simply became stricter.
225+
226+
**Fix**: Append `-Wno-error=dangling-assignment-gsl` via `TRITON_APPEND_CMAKE_ARGS`.
227+
228+
### Pitfall 3: Linker Error — undefined reference to std::__throw_bad_array_new_length
229+
230+
**Symptom**:
231+
```
232+
undefined reference to `std::__throw_bad_array_new_length()'
233+
```
234+
235+
**Root cause**: `std::__throw_bad_array_new_length` was introduced in GCC 12 / libstdc++ 12. Our system has GCC 10.3 (GLIBCXX_3.4.28), but the pre-built LLVM static libraries were compiled with GCC 12+ and require GLIBCXX_3.4.30.
236+
237+
**Fix**: The pre-built LLVM ships its own `libstdc++.so.6.0.30`. Set `LIBRARY_PATH` and `LD_LIBRARY_PATH` to point to the LLVM `lib` directory, and add `-L$LLVM_SYSPATH/lib` to the CMake linker flags.
238+
239+
### Pitfall 4: Using GCC Instead of Clang? Dead End.
240+
241+
**Symptom**: Setting `FLAGTREE_USE_SYSTEM_CC=1` to compile with GCC 10 results in a flood of template syntax errors.
242+
243+
**Root cause**: FlagTree's C++ code (especially the FLIR and AscendNPU-IR components) extensively uses clang-specific template syntax that GCC 10 cannot parse.
244+
245+
**Lesson**: **Do not use GCC to compile FlagTree — you must use clang.** The `FLAGTREE_USE_SYSTEM_CC` flag does not work in the Ascend aarch64 environment.
246+
247+
## 5. Environment Variable Quick Reference
248+
249+
Every time you open a new terminal, set the following variables to use FlagTree:
250+
251+
```bash
252+
# Base environment (CANN + PyTorch + venv)
253+
source /your/venv/setup_env.sh
254+
255+
# FlagTree runtime
256+
export LLVM_SYSPATH=~/.flagtree/ascend/llvm-a66376b0-ubuntu-aarch64-python311-compat
257+
export LD_LIBRARY_PATH=$LLVM_SYSPATH/lib:${LD_LIBRARY_PATH:-}
258+
```
259+
260+
> We recommend creating a `setup_flagtree.sh` script to set everything up in one line.
261+
262+
## 6. Summary
263+
264+
| Step | Description | Time |
265+
|------|-------------|------|
266+
| Clone repo + submodules | Pull source code | ~5min |
267+
| Install build deps | cmake, ninja, pybind11 | ~2min |
268+
| Download pre-built LLVM | ~500MB | ~5min |
269+
| Extract build deps | googletest, json | ~1min |
270+
| Set environment variables | The most critical step | ~5min |
271+
| Build and install | pip install -e . | ~15min |
272+
| Verify | import triton + kernel test | ~2min |
273+
274+
**Key takeaways**:
275+
1. You **must use clang** (from the LLVM bundle), not system GCC
276+
2. You **must add LLVM's lib to the linker path** (libstdc++ version mismatch)
277+
3. You **must append** `-Wno-error=dangling-assignment-gsl` (new clang-21 warning)
278+
4. `TRITON_APPEND_CMAKE_ARGS` is your lifeline for passing extra CMake arguments
279+
280+
We hope this tutorial helps anyone working with Triton on Ascend NPU. Feel free to leave questions in the comments!
281+
282+
---
283+
284+
*This tutorial is based on FlagTree v0.5.0 (commit 4d9e18e), verified on Ascend910 + CANN 8.5.0 + openEuler 2203sp4 aarch64.*

0 commit comments

Comments
 (0)