[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513
[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513EmmonsCurse wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
|
/skip-ci ci_iluvatar |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-20 17:10:08
📋 Review 摘要
PR 概述:通过 monkey-patch Paddle 的 _get_cuda_arch_flags 函数,防止 Paddle PR #78704 引入的自动 CUDA arch flag 注入导致重复 gencode 标志和链接错误。
变更范围:custom_ops/setup_ops.py(构建配置)
影响面 Tag:CI OP
📝 PR 规范检查
PR 描述中 Modifications 部分与实际实现不完全一致:描述称"Overrode PADDLE_CUDA_ARCH_LIST by setting it to an empty string in get_gencode_flags",但实际实现是 monkey-patch extension_utils._get_cuda_arch_flags,未修改 get_gencode_flags 函数本身,也未设置环境变量。建议更新描述以准确反映实际实现方式。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | custom_ops/setup_ops.py:63 |
"第二道防线" safeguard 大概率为死代码 |
| 🟡 建议 | custom_ops/setup_ops.py:52 |
flag 子串匹配存在误匹配风险 |
总体评价
该 PR 通过 monkey-patch 方式解决了 Paddle 上游变更导致的重复 gencode 标志问题,核心 patch 逻辑(_patched_get_cuda_arch_flags)思路正确,注释详尽。两个小建议:1) "第二道防线" 代码块可能无法生效,建议确认或移除;2) flag 匹配条件可以更精确以避免潜在误判。
| # Additional safeguard (important): | ||
| # Some Paddle versions may have additional internal methods that add gencode flags. | ||
| # This patch serves as a second line of defense by overriding such methods. | ||
| if hasattr(extension_utils, "CUDAExtension"): |
There was a problem hiding this comment.
🟡 建议 extension_utils 模块上不太可能存在 CUDAExtension 类属性。
CUDAExtension 是从 paddle.utils.cpp_extension 导入的函数/类,而非 extension_utils 模块的属性。因此 hasattr(extension_utils, "CUDAExtension") 大概率为 False,这段 "第二道防线" 实际上是死代码,无法起到防护作用。
建议:
- 确认 Paddle 对应版本中
extension_utils模块是否确实有CUDAExtension属性,如果没有则移除这段代码以避免误导; - 如果确需额外防护,可考虑直接 patch
paddle.utils.cpp_extension.CUDAExtension本身。
| """ | ||
| if cflags: | ||
| for flag in cflags: | ||
| if isinstance(flag, str) and (flag.startswith("-gencode") or "compute_" in flag or "sm_" in flag): |
There was a problem hiding this comment.
🟡 建议 flag 检测逻辑存在误匹配风险。
"compute_" in flag 和 "sm_" in flag 使用子串匹配,可能误匹配非 gencode 标志(例如包含路径中含 sm_ 或 compute_ 的 -I include 路径)。虽然目前场景下概率较低,但作为通用 patch 函数可以更精确。
建议收紧匹配条件,例如仅检查 -gencode 和 -arch 前缀:
if isinstance(flag, str) and (flag.startswith("-gencode") or flag.startswith("-arch")):
return []
Motivation
Recent changes in Paddle #78704 modified the behavior of
CUDAExtension, introduced automatic CUDA architecture flag injection viaPADDLE_CUDA_ARCH_LIST, even when custom-gencodeflags are already specified.This results in duplicated CUDA arch flags during compilation, increasing binary size and potentially causing linker errors such as:
relocation truncated to fitTo maintain stable builds and avoid unnecessary code generation, a workaround is required.
Modifications
extension_utils._get_cuda_arch_flagsto return an empty list when user-defined-gencodeflags are detected, preventing Paddle from auto-injecting CUDA arch flags.CUDAExtension._add_cuda_arch_flagsto ensure no additional arch flags are appended internally.get_gencode_flags, avoiding reliance onPADDLE_CUDA_ARCH_LIST.-gencodeentries.arch=compute_xxa,code=sm_xxapairs (e.g.,90a,100a) and avoided incomplete flags likearch=compute_90a.Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.