Skip to content

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513

Open
EmmonsCurse wants to merge 2 commits intoPaddlePaddle:developfrom
EmmonsCurse:fix_build_error_in_90or100
Open

[CI] Disable auto CUDA arch injection to avoid duplicate gencode flags#7513
EmmonsCurse wants to merge 2 commits intoPaddlePaddle:developfrom
EmmonsCurse:fix_build_error_in_90or100

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

@EmmonsCurse EmmonsCurse commented Apr 20, 2026

Motivation

Recent changes in Paddle #78704 modified the behavior of CUDAExtension, introduced automatic CUDA architecture flag injection via PADDLE_CUDA_ARCH_LIST, even when custom -gencode flags are already specified.

This results in duplicated CUDA arch flags during compilation, increasing binary size and potentially causing linker errors such as:

  • relocation truncated to fit

To maintain stable builds and avoid unnecessary code generation, a workaround is required.

Modifications

  • Patched extension_utils._get_cuda_arch_flags to return an empty list when user-defined -gencode flags are detected, preventing Paddle from auto-injecting CUDA arch flags.
  • Added a secondary safeguard by overriding CUDAExtension._add_cuda_arch_flags to ensure no additional arch flags are appended internally.
  • Explicitly controlled CUDA architecture flags via get_gencode_flags, avoiding reliance on PADDLE_CUDA_ARCH_LIST.
  • Effectively disabled Paddle’s automatic CUDA arch injection mechanism to prevent duplicated -gencode entries.
  • Ensured correct generation of arch=compute_xxa,code=sm_xxa pairs (e.g., 90a, 100a) and avoided incomplete flags like arch=compute_90a.
  • Reduced the risk of compilation and linking issues (e.g., relocation overflow) caused by conflicting or duplicated CUDA arch flags.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

@EmmonsCurse
Copy link
Copy Markdown
Collaborator Author

EmmonsCurse commented Apr 20, 2026

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci stable_test
/skip-ci base_test
/skip-ci pre_ce_test

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 17:10:08

📋 Review 摘要

PR 概述:通过 monkey-patch Paddle 的 _get_cuda_arch_flags 函数,防止 Paddle PR #78704 引入的自动 CUDA arch flag 注入导致重复 gencode 标志和链接错误。
变更范围custom_ops/setup_ops.py(构建配置)
影响面 TagCI OP

📝 PR 规范检查

PR 描述中 Modifications 部分与实际实现不完全一致:描述称"Overrode PADDLE_CUDA_ARCH_LIST by setting it to an empty string in get_gencode_flags",但实际实现是 monkey-patch extension_utils._get_cuda_arch_flags,未修改 get_gencode_flags 函数本身,也未设置环境变量。建议更新描述以准确反映实际实现方式。

问题

级别 文件 概述
🟡 建议 custom_ops/setup_ops.py:63 "第二道防线" safeguard 大概率为死代码
🟡 建议 custom_ops/setup_ops.py:52 flag 子串匹配存在误匹配风险

总体评价

该 PR 通过 monkey-patch 方式解决了 Paddle 上游变更导致的重复 gencode 标志问题,核心 patch 逻辑(_patched_get_cuda_arch_flags)思路正确,注释详尽。两个小建议:1) "第二道防线" 代码块可能无法生效,建议确认或移除;2) flag 匹配条件可以更精确以避免潜在误判。

Comment thread custom_ops/setup_ops.py
# Additional safeguard (important):
# Some Paddle versions may have additional internal methods that add gencode flags.
# This patch serves as a second line of defense by overriding such methods.
if hasattr(extension_utils, "CUDAExtension"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 extension_utils 模块上不太可能存在 CUDAExtension 类属性。

CUDAExtension 是从 paddle.utils.cpp_extension 导入的函数/类,而非 extension_utils 模块的属性。因此 hasattr(extension_utils, "CUDAExtension") 大概率为 False,这段 "第二道防线" 实际上是死代码,无法起到防护作用。

建议:

  1. 确认 Paddle 对应版本中 extension_utils 模块是否确实有 CUDAExtension 属性,如果没有则移除这段代码以避免误导;
  2. 如果确需额外防护,可考虑直接 patch paddle.utils.cpp_extension.CUDAExtension 本身。

Comment thread custom_ops/setup_ops.py
"""
if cflags:
for flag in cflags:
if isinstance(flag, str) and (flag.startswith("-gencode") or "compute_" in flag or "sm_" in flag):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 flag 检测逻辑存在误匹配风险。

"compute_" in flag"sm_" in flag 使用子串匹配,可能误匹配非 gencode 标志(例如包含路径中含 sm_compute_-I include 路径)。虽然目前场景下概率较低,但作为通用 patch 函数可以更精确。

建议收紧匹配条件,例如仅检查 -gencode-arch 前缀:

if isinstance(flag, str) and (flag.startswith("-gencode") or flag.startswith("-arch")):
    return []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants