Skip to content

[XPU] improce attn precision#7515

Open
lizan1999 wants to merge 1 commit intoPaddlePaddle:developfrom
lizan1999:imporve_attn_precision
Open

[XPU] improce attn precision#7515
lizan1999 wants to merge 1 commit intoPaddlePaddle:developfrom
lizan1999:imporve_attn_precision

Conversation

@lizan1999
Copy link
Copy Markdown
Contributor

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

improve speculative_attention_decoder TGEMM from tfloat32 to float
As for flash_attention_context_vllm and paged_attention_xft, use the dedicated version of vLLM directly.

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the XPU label Apr 20, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 16:03 CST

📋 Review 摘要

PR 概述:将 XPU speculative_attention_decoder 的 TGEMM 计算类型从 tfloat32 提升为 float,以改善注意力精度。
变更范围custom_ops/xpu_ops/src/ops/block_attn.cc
影响面 TagXPU OP

📝 PR 规范检查

PR 标题存在拼写错误 "improce" → "improve",且 Motivation 章节未填写、Accuracy Tests 章节为空(本次变更涉及精度提升,建议提供精度对比数据)。PR 描述中提到了对 flash_attention_context_vllmpaged_attention_xft 的变更,但 diff 中仅包含 speculative_attention_decoder 的修改,描述与实际变更不一致。

标题建议(可直接复制):

  • [XPU] Improve attention precision for speculative_attention_decoder

描述建议

  • Motivation 章节填写精度改善的背景和原因
  • Accuracy Tests 章节补充 tfloat32 vs float 的精度对比数据
  • Modifications 章节更新以匹配实际 diff 内容

问题

级别 文件 概述
🟡 建议 block_attn.cc:709 block_attn_spliced.cc 中存在相同的 TGEMM 定义仍使用 tfloat32,未同步更新

总体评价

变更逻辑清晰,将 tfloat32 替换为 float 可提升计算精度。但 block_attn_spliced.cc 中存在完全相同的 TGEMM 定义未被同步修改,建议作者确认是否需要一并更新以保持一致性。

tfloat32,
int8_wo_t>;
using TGEMM = std::
conditional_t<std::is_same_v<XPU_XType, XPU_CType>, float, int8_wo_t>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 block_attn_spliced.cc 中存在相同的 TGEMM 定义未同步更新

block_attn_spliced.cc:1545-1547 中存在完全相同的代码,仍然使用 tfloat32

using TGEMM = std::conditional_t<std::is_same_v<XPU_XType, XPU_CType>,
                                 tfloat32,
                                 int8_wo_t>;

该处同样用于 speculative_attention_decoder 的模板参数。如果本 PR 的目的是全面提升 speculative attention 的精度,建议同步将 block_attn_spliced.cc 中的 tfloat32 也修改为 float,以保持两个代码路径的一致性。

请确认是否是遗漏,还是有意保持该文件不变?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants