Skip to content

[XPU][CI] update xpu ci#7514

Open
plusNew001 wants to merge 1 commit intoPaddlePaddle:developfrom
plusNew001:0420-ci-update
Open

[XPU][CI] update xpu ci#7514
plusNew001 wants to merge 1 commit intoPaddlePaddle:developfrom
plusNew001:0420-ci-update

Conversation

@plusNew001
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings April 20, 2026 07:42
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 主要围绕 XPU CI 流水线增强:新增 XPU 单测 Job、采集并汇总多路覆盖率数据、并将覆盖率结果上传(BOS + Codecov),同时调整了 Metax CI 的超时时间设置。

Changes:

  • 新增 XPU 单测工作流(_xpu_unit_test.yml),并在 ci_xpu.yml 中接入
  • 为 XPU 4/8 卡 case 测试与单测补充 coverage 采集与上传,并新增覆盖率汇总/增量检查工作流(_xpu_coverage_report.yml
  • 新增 XPU 覆盖率配置文件(.coveragerc_xpu)与 XPU 相关 pytest 配置文件

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/xpu_ci/unit_test/pytest.ini 新增 XPU 模型功能单测的 pytest 配置入口
scripts/.coveragerc_xpu 新增 XPU 覆盖率采集/合并配置
custom_ops/xpu_ops/test/pytest.ini 为 XPU 自定义算子单测增加 pytest 忽略列表
.github/workflows/ci_xpu.yml 接入 XPU unit test 与 coverage report 两个新 Job
.github/workflows/ci_metax.yml 调整 Metax Jenkins step 超时时间
.github/workflows/_xpu_unit_test.yml 新增可复用的 XPU 单测执行与覆盖率上传流程
.github/workflows/_xpu_coverage_report.yml 新增覆盖率汇总、diff-cover 增量检查、并上传 Codecov 的流程
.github/workflows/_xpu_8cards_case_test.yml 为 8 卡 case 测试增加 coverage 采集与上传输出
.github/workflows/_xpu_4cards_case_test.yml 为 4 卡 case 测试增加 coverage 采集与上传输出

Comment on lines +54 to +62
xpu_unit_test:
name: xpu_unit_test
needs: [clone, xpu_build_test]
uses: ./.github/workflows/_xpu_unit_test.yml
with:
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:ci
FASTDEPLOY_WHEEL_URL: ${{ needs.xpu_build_test.outputs.wheel_path }}
secrets:
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 PR 的描述目前还是模板内容(Motivation/Modifications/Usage/Accuracy Tests 等未填写),会影响后续 CI/维护同学理解变更目的与验证方式。建议补充:为何要新增 XPU unit test/coverage、覆盖率阈值策略、以及跳过用例的原因/恢复计划。

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,3 @@
[pytest]
# 跳过以下模型功能单测(当前存在已知问题)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的注释说明“跳过以下模型功能单测”,但当前 addopts 为空,没有实际配置任何 --ignore / marker,pytest 运行时不会跳过任何用例。建议补充需要跳过的具体用例/目录,或移除该注释以避免误导。

Suggested change
# 跳过以下模型功能单测(当前存在已知问题)

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +127
-e "PADDLEVERSION=${PADDLEVERSION}" \
-e "PADDLE_WHL_URL=${PADDLE_WHL_URL}" \
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_call 定义了 PADDLEVERSION/PADDLE_WHL_URL 输入,但这里传入容器的是宿主机环境变量 ${PADDLEVERSION}/${PADDLE_WHL_URL},在未显式 export 的情况下会一直为空,导致调用方传入的 inputs 不生效。建议改为从 inputs 取值(例如通过 step env 注入)或移除这两个 inputs。

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 17:12:27

📋 Review 摘要

PR 概述:为 XPU CI 新增单元测试工作流、代码覆盖率收集与合并报告(含增量覆盖率检查和 Codecov 上传),同时调整 MetaX CI 超时时间。
变更范围.github/workflows/(CI 工作流)、scripts/(覆盖率配置)、custom_ops/xpu_ops/test/(pytest 配置)、tests/xpu_ci/unit_test/(pytest 配置)
影响面 Tag[CI] [XPU]

📝 PR 规范检查

PR 描述未填写 Motivation 和 Modifications 内容,建议补充以便审查者理解变更背景。

描述模板(可直接复制):

## Motivation

为 XPU CI 添加代码覆盖率收集能力,包括单元测试、4卡/8卡集成测试的覆盖率数据采集,以及增量覆盖率检查(fail-under=80%)和 Codecov 上传。同时新增 XPU 自定义算子单测和模型功能单测工作流。

## Modifications

1. 新增 `_xpu_unit_test.yml`:XPU 自定义算子单测 + 模型功能单测工作流,支持覆盖率采集
2. 新增 `_xpu_coverage_report.yml`:汇总各测试 Job 的覆盖率数据,生成全量/增量覆盖率报告并上传 BOS 和 Codecov
3. 修改 `_xpu_4cards_case_test.yml` / `_xpu_8cards_case_test.yml`:集成 coverage 采集和 BOS 上传
4. 修改 `ci_xpu.yml`:编排新增的 unit_test 和 coverage_report Job
5. 新增 `scripts/.coveragerc_xpu`:XPU 专用的 coverage 配置
6. 新增 `custom_ops/xpu_ops/test/pytest.ini`:跳过已知问题的算子单测
7. 新增 `tests/xpu_ci/unit_test/pytest.ini`:模型功能单测 pytest 配置
8. 修改 `ci_metax.yml`:Jenkins 触发超时从 120 分钟降至 60 分钟

问题

级别 文件 概述
🟡 建议 _xpu_coverage_report.yml:38 xpu_coverage_combine job 使用 XPU runner 但实际不需要硬件资源
🟡 建议 custom_ops/xpu_ops/test/pytest.ini:24 文件末尾缺少换行符
❓ 疑问 scripts/.coveragerc_xpu:4 source = fastdeploy 是否覆盖了自定义算子代码的覆盖率需求

总体评价

整体实现结构清晰,覆盖率采集 → 合并 → 增量检查 → Codecov 上传的流水线设计合理,与现有 GPU 覆盖率工作流(_unit_test_coverage.yml)保持了一致的架构风格。BOS 上传路径和 URL 构造逻辑正确,always() 条件确保覆盖率数据不会因测试失败而丢失。建议关注 xpu_coverage_combine job 的 runner 选型以节省 XPU 卡资源。

workflow-name: xpu_coverage

xpu_coverage_combine:
runs-on: [self-hosted, XPU-P800-4Cards]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 xpu_coverage_combine job 使用 XPU-P800-4Cards runner,但该 job 仅执行 Python 覆盖率合并操作,不需要 XPU 硬件资源。

建议改为 ubuntu-latest 或其他通用 runner,避免占用稀缺的 XPU 卡资源,同时也能加速调度(通用 runner 排队时间更短)。

runs-on: ubuntu-latest

--ignore=custom_ops/xpu_ops/test/test_speculate_verify.py
--ignore=custom_ops/xpu_ops/test/test_token_repetition_penalty.py
--ignore=custom_ops/xpu_ops/test/test_update_inputs.py
--ignore=custom_ops/xpu_ops/test/test_weight_quantize_xpu.py No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 文件末尾缺少换行符(No newline at end of file)。

大多数 linter 和 POSIX 标准要求文件末尾以换行符结束,建议在最后一行末尾添加换行符。

Comment thread scripts/.coveragerc_xpu
[run]
branch = True
source = fastdeploy
concurrency = multiprocessing
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 concurrency = multiprocessingparallel = True 配合使用是正确的。但 source = fastdeploy 表示 coverage 只追踪 fastdeploy 包内的代码。

_xpu_unit_test.yml 中,自定义算子单测 (custom_ops/xpu_ops/test/) 被 coverage run 执行,但由于 source = fastdeploy,只会收集 fastdeploy/ 包的覆盖率,而非 custom_ops/ 目录的代码覆盖率。请确认这是预期行为(即只关注 fastdeploy 包被算子单测调用到的覆盖率),还是也需要收集 custom_ops/ 自身的代码覆盖率?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants