Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-20 16:03 CST
📋 Review 摘要
PR 概述:将 XPU speculative_attention_decoder 的 TGEMM 计算类型从 tfloat32 提升为 float,以改善注意力精度。
变更范围:custom_ops/xpu_ops/src/ops/block_attn.cc
影响面 Tag:XPU OP
📝 PR 规范检查
PR 标题存在拼写错误 "improce" → "improve",且 Motivation 章节未填写、Accuracy Tests 章节为空(本次变更涉及精度提升,建议提供精度对比数据)。PR 描述中提到了对 flash_attention_context_vllm 和 paged_attention_xft 的变更,但 diff 中仅包含 speculative_attention_decoder 的修改,描述与实际变更不一致。
标题建议(可直接复制):
[XPU] Improve attention precision for speculative_attention_decoder
描述建议:
- Motivation 章节填写精度改善的背景和原因
- Accuracy Tests 章节补充
tfloat32vsfloat的精度对比数据 - Modifications 章节更新以匹配实际 diff 内容
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | block_attn.cc:709 |
block_attn_spliced.cc 中存在相同的 TGEMM 定义仍使用 tfloat32,未同步更新 |
总体评价
变更逻辑清晰,将 tfloat32 替换为 float 可提升计算精度。但 block_attn_spliced.cc 中存在完全相同的 TGEMM 定义未被同步修改,建议作者确认是否需要一并更新以保持一致性。
| tfloat32, | ||
| int8_wo_t>; | ||
| using TGEMM = std:: | ||
| conditional_t<std::is_same_v<XPU_XType, XPU_CType>, float, int8_wo_t>; |
There was a problem hiding this comment.
🟡 建议 block_attn_spliced.cc 中存在相同的 TGEMM 定义未同步更新
block_attn_spliced.cc:1545-1547 中存在完全相同的代码,仍然使用 tfloat32:
using TGEMM = std::conditional_t<std::is_same_v<XPU_XType, XPU_CType>,
tfloat32,
int8_wo_t>;该处同样用于 speculative_attention_decoder 的模板参数。如果本 PR 的目的是全面提升 speculative attention 的精度,建议同步将 block_attn_spliced.cc 中的 tfloat32 也修改为 float,以保持两个代码路径的一致性。
请确认是否是遗漏,还是有意保持该文件不变?
Motivation
Modifications
improve speculative_attention_decoder TGEMM from tfloat32 to float
As for flash_attention_context_vllm and paged_attention_xft, use the dedicated version of vLLM directly.
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.