Skip to content

ADD Triron doc #534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions paddle_triton.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@

# Paddle Inference 支持Triton自定义算子使用方法

## 1. 相关背景

* Triton支持自定义算子,用户可以自己开发算子并注册到Triton中,在推理时调用。
* Triton 仅提供Python API,采用运行时JIT编译,即仅当用户代码的执行流执行到这个kernel的时候,这个kernel才真正的被编译。

## 2. Paddle Inference支持Triton自定义算子使用方法

Paddle Inference 提供了部分 `Norm`类和`date copy`类的融合算子,集成在paddlemix库中,用户可以按照需求使用这些算子。
Paddle Inference支持 `PaddleMix Triton`自定义算子使用方法如下:
* 步骤1. 用户需要下载`PaddleMIX`库
```bash
git clone https://github.com/PaddlePaddle/PaddleMIX.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export PYTHONPATH=/path/to/PaddleMIX/:$PYTHONPATH

```
* 步骤2. 用户需要安装Triton,并适配Paddle
```bash
python -m pip install triton
python -m pip install git+https://github.com/zhoutianzi666/UseTritonInPaddle.git
python -c "import use_triton_in_paddle; use_triton_in_paddle.make_triton_compatible_with_paddle()"
```
* 步骤3. 在需要使用Triton算子的python文件中`import paddlemix`
* 步骤4. 调用PaddleMix中对应的Triton算子API,实现高性能算子加速。

## 3. Paddle Inference 使用`PaddleMixTriton`自定义算子示例
使用Triton算子优化后代码:
```py
# 引入PaddleMIX
import os
os.sys.path.append(["/path/to/PaddleMIX"])
import paddlemix
# 参数准备
emb = self.linear(self.silu(conditioning_embedding).cast(x.dtype))
scale, shift = paddle.chunk(emb, 2, axis=1)

# Triton API :adaptive_layer_norm
x = paddlemix.triton_ops.adaptive_layer_norm(
x, scale_msa, shift_msa, self.norm.weight, self.norm.bias, epsilon=1e-06)
```

优化前代码:
```py
# 参数准备
emb = self.linear(self.silu(conditioning_embedding).cast(x.dtype))
scale, shift = paddle.chunk(emb, 2, axis=1)
norm_elementwise_affine_kwargs = dict(weight_attr=False, bias_attr=False)

# 原低效的算子实现
self.norm = nn.LayerNorm(embedding_dim, epsilon=1e-6, **norm_elementwise_affine_kwargs)
x = self.norm(x) * (1 + scale_msa[:, None]) + shift_msa[:, None]
```


## 4. 注意事项
* 1.用户需要注意参数顺序Triton API中规定的参数,以及参数的填充顺序。
* 2.用户需要注意Triton API中参数的默认值与需要填充的参数的是否一致。例如权重和偏置的默认值是否为None。
给出Triton API中参数的默认值和需要填充的参数的示例如下:

```py
adaptive_layer_norm(x, scale, shift, weight=None, bias=None, epsilon=1e-05)

fused_adaLN_scale_residual(x, mha_out, gate_msa, scale_mlp, shift_mlp, weight=None, bias=None, epsilon=1e-05)

split_concat(x, y)

triton_split(x, num_or_sections=[-1, -1], axis=1)
```

## Q&A
* `triton Error [CUDA]: device kernel image is invalid\no exit`
使用Triton过程中,如果遇到上述问题,请参考如下解决方案:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用户triton中的ptxas准确位置,可以用这个命令去找find /your_triton_package_path -name ptxas


```bash
cp /your_cuda_path/bin/ptxas /your_python_path/site-packages/triton/backends/nvidia/bin/
# 将triton安装包中的ptxas,替换为 /your_cuda_path/bin/ptxas
# 不同Triton版本的patxas路径可能不同;用户triton中的ptxas准确位置,可以用如下命令寻找:
# find /your_triton_package_path -name ptxas
# 此方案在Triton 2.3.0 & 3.0.0版本中有效,其他版本的Triton尚未验证。
```