[XPU] improce attn precision by lizan1999 · Pull Request #7515 · PaddlePaddle/FastDeploy

lizan1999 · 2026-04-20T07:49:17Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

improve speculative_attention_decoder TGEMM from tfloat32 to float
As for flash_attention_context_vllm and paged_attention_xft, use the dedicated version of vLLM directly.

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-20T07:49:23Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-20 16:03 CST

📋 Review 摘要

PR 概述：将 XPU speculative_attention_decoder 的 TGEMM 计算类型从 tfloat32 提升为 float，以改善注意力精度。
变更范围：custom_ops/xpu_ops/src/ops/block_attn.cc
影响面 Tag：XPU OP

📝 PR 规范检查

PR 标题存在拼写错误 "improce" → "improve"，且 Motivation 章节未填写、Accuracy Tests 章节为空（本次变更涉及精度提升，建议提供精度对比数据）。PR 描述中提到了对 flash_attention_context_vllm 和 paged_attention_xft 的变更，但 diff 中仅包含 speculative_attention_decoder 的修改，描述与实际变更不一致。

标题建议（可直接复制）：

[XPU] Improve attention precision for speculative_attention_decoder

描述建议：

Motivation 章节填写精度改善的背景和原因
Accuracy Tests 章节补充 tfloat32 vs float 的精度对比数据
Modifications 章节更新以匹配实际 diff 内容

问题

级别	文件	概述
🟡 建议	`block_attn.cc:709`	`block_attn_spliced.cc` 中存在相同的 `TGEMM` 定义仍使用 `tfloat32`，未同步更新

总体评价

变更逻辑清晰，将 tfloat32 替换为 float 可提升计算精度。但 block_attn_spliced.cc 中存在完全相同的 TGEMM 定义未被同步修改，建议作者确认是否需要一并更新以保持一致性。

PaddlePaddle-bot · 2026-04-20T08:03:35Z

-                                       tfloat32,
-                                       int8_wo_t>;
+      using TGEMM = std::
+          conditional_t<std::is_same_v<XPU_XType, XPU_CType>, float, int8_wo_t>;


🟡 建议 block_attn_spliced.cc 中存在相同的 TGEMM 定义未同步更新

block_attn_spliced.cc:1545-1547 中存在完全相同的代码，仍然使用 tfloat32：

using TGEMM = std::conditional_t<std::is_same_v<XPU_XType, XPU_CType>, tfloat32, int8_wo_t>;

该处同样用于 speculative_attention_decoder 的模板参数。如果本 PR 的目的是全面提升 speculative attention 的精度，建议同步将 block_attn_spliced.cc 中的 tfloat32 也修改为 float，以保持两个代码路径的一致性。

请确认是否是遗漏，还是有意保持该文件不变？

[XPU] improce attn precision

d48b804

lizan1999 temporarily deployed to Metax_ci April 20, 2026 07:49 — with GitHub Actions Inactive

paddle-bot bot added the XPU label Apr 20, 2026

PaddlePaddle-bot reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] improce attn precision#7515

[XPU] improce attn precision#7515
lizan1999 wants to merge 1 commit intoPaddlePaddle:developfrom
lizan1999:imporve_attn_precision

lizan1999 commented Apr 20, 2026

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lizan1999 commented Apr 20, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants