Skip to content

[Cpp API Compatibility] Remove deepep legacy APIs#78549

Open
youge325 wants to merge 8 commits intoPaddlePaddle:developfrom
youge325:remove-deepep-legacy-apis
Open

[Cpp API Compatibility] Remove deepep legacy APIs#78549
youge325 wants to merge 8 commits intoPaddlePaddle:developfrom
youge325:remove-deepep-legacy-apis

Conversation

@youge325
Copy link
Copy Markdown
Contributor

@youge325 youge325 commented Apr 1, 2026

PR Category

Execute Infrastructure

PR Types

Deprecations

Description

拆分自 #78484

是否引起精度变化

Copilot AI review requested due to automatic review settings April 1, 2026 11:11
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 1, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the C++ API compatibility cleanup by removing DeepEP-specific legacy stream/event APIs from the compat headers, aiming to converge on a PyTorch-shaped surface.

Changes:

  • Tighten CUDA stream pool initialization by adding error checking and align getStreamFromPool overload shape to support an int priority entry point.
  • Remove deprecated legacy APIs that exposed raw CUDA streams (CUDAStream::raw_stream(), Event::record(cudaStream_t), and Tensor::record_stream(cudaStream_t / at::cuda::CUDAStream)).
  • Expose c10::Stream into namespace at to support at::Stream usage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
paddle/phi/api/include/compat/c10/cuda/CUDAStream.h Adds C10_CUDA_CHECK for priority range init; adds priority-based getStreamFromPool overload and maps bool overload to it.
paddle/phi/api/include/compat/c10/core/Stream.h Introduces at::Stream aliasing/export of c10::Stream.
paddle/phi/api/include/compat/c10/core/Event.h Removes legacy record(cudaStream_t) overload.
paddle/phi/api/include/compat/ATen/ops/record_stream.h Removes legacy record_stream overloads taking at::cuda::CUDAStream and cudaStream_t.
paddle/phi/api/include/compat/ATen/core/TensorBody.h Removes corresponding legacy record_stream declarations from Tensor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +109 to +112

namespace at {
using c10::Stream;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding namespace at { using c10::Stream; } here conflicts with an existing at::Stream definition in paddle/phi/api/include/compat/ATen/core/TensorBody.h (currently using Stream = c10::Stream;). Since TensorBody.h includes <c10/core/Stream.h>, this will trigger a redefinition error when compiling any TU that includes ATen/core/TensorBody.h.

To fix: keep a single canonical at::Stream definition (either remove the Stream alias from TensorBody.h, or drop this new at::Stream export and rely on the existing one).

Suggested change
namespace at {
using c10::Stream;
}

Copilot uses AI. Check for mistakes.
@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 1, 2026

/re-run all-failed

1 similar comment
@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 1, 2026

/re-run all-failed

Copy link
Copy Markdown
Contributor

@ShigureNyako ShigureNyako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这轮清理方向和 PyTorch 对齐本身我认可:

  • c10::cuda::CUDAStream 上游只有 stream(),没有 raw_stream()
  • c10::Event 上游只有 record(const Stream&)
  • getStreamFromPool 也确实有 int priority 形态

但当前时点我还是得先卡住,因为这会造成明确的 BREAKING CHANGE,且下游闭环还没完成:

  1. PaddleFleet develop 目前仍固定 third_party/DeepEP3fed158b4751a4a89c92902aeb20bb8c8384f1e7
  2. 这个版本的 DeepEP 还在直接使用这次要删掉的旧接口:
    • csrc/event.hppevent->record(const cudaStream_t&)at::cuda::getCurrentCUDAStream().raw_stream()
    • csrc/deep_ep.cpp:多处 t.record_stream(comm_stream),其中 comm_stream 仍是 cudaStream_t
  3. DeepEP 侧迁移 PR PFCCLab/DeepEP#10 虽然已经合到了 paddle 分支(merge commit 5f63568c9810b53e999fa56fd6981c40ba4a5d57),但 PaddleFleet 这边还没有把 third_party/DeepEP bump 过去。
  4. 当前 PaddleFleet #712 标题写的是 “update submodule DeepEP”,但 diff 实际只改了 third_party/DeepGEMM,没有更新 third_party/DeepEP,所以还不能作为这个 cleanup 的解锁证据。

因此“暂时不能合入”这个判断我认为仍然成立。
请先把 PaddleFleet / 实际下游切到不依赖 raw_stream() / Event::record(cudaStream_t) / Tensor::record_stream(cudaStream_t) 的版本,并给出对应 cross-reference / CI 证据,再来清理这批兼容接口。

@ShigureNyako
Copy link
Copy Markdown
Contributor

ShigureNyako commented Apr 3, 2026

我重新看了这次 force-push。

当前 diff 相比上一版,主要是:

  • 删除 Tensor::record_stream(cudaStream_t) 的 TODO / 测试;
  • 保留 Tensor::record_stream(at::cuda::CUDAStream),实现改为转发到 record_stream(s.unwrap())
  • 继续移除 Event::record(cudaStream_t)CUDAStream::raw_stream()

这些调整没有改变我上一轮卡住的核心点:下游闭环还没完成。

到目前为止,PaddlePaddle/PaddleFleet#712 的 head 仍然是 9789e7f,diff 也仍只改了 third_party/DeepGEMM,没有把 third_party/DeepEP bump 到兼容这批删除项的版本,所以还不能作为“下游已经切完”的证据。

因此我这边 blocker 维持不变:还是需要先补齐 DeepEP / PaddleFleet 侧的实际切换,以及对应的 cross-reference / CI 证据,再来清理这批兼容接口。

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 3, 2026

/re-run all-failed

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@63eaee1). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #78549   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         1           
  Lines              ?         6           
  Branches           ?         0           
===========================================
  Hits               ?         6           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@youge325 youge325 force-pushed the remove-deepep-legacy-apis branch from f71e518 to 4f6abb2 Compare April 8, 2026 00:51
@youge325 youge325 changed the title [Cpp API Compatibility] Remove deepep legacy apis [Cpp API Compatibility] Remove deepep legacy APIs Apr 8, 2026
@SigureMo
Copy link
Copy Markdown
Member

merge 了下 develop 确认下 #78389 合入后能够监控 PR 的改动,此次 CI 应当挂掉

PaddlePaddle/PaddleFleet#712 合入后 rerun,CI 应当再次通过

@SigureMo
Copy link
Copy Markdown
Member

喔喔,#78595 引入了新的 DeepEP 兼容性 API,稍后 #78595 合入后这边也得单独处理下

@youge325
Copy link
Copy Markdown
Contributor Author

喔喔,#78595 引入了新的 DeepEP 兼容性 API,稍后 #78595 合入后这边也得单独处理下

好的

@SigureMo
Copy link
Copy Markdown
Member

稍等,先不要解决冲突,等一下 https://github.com/PaddlePaddle/Paddle/actions/runs/24223286023/job/70722751293?pr=78549 结果

@SigureMo
Copy link
Copy Markdown
Member

好啦,按照预期挂啦,可以解冲突啦

@ShigureNyako
Copy link
Copy Markdown
Contributor

我重新看了这次 merge develop 后的新 diff。

PaddlePaddle/PaddleFleet#712PaddlePaddle/Paddle#78595 现在都已经合进去了,所以我上一轮 comment 里“先等前置 PR 合入”的那部分前提已经变化了。

但当前这版 diff 里,实际仍然在继续删除这几类 compat 接口:

  • Tensor::record_stream(cudaStream_t / hipStream_t)ATen/core/TensorBody.hATen/ops/record_stream.h
  • Event::record(cudaStream_t / hipStream_t)c10/core/Event.h
  • CUDAStream::raw_stream()c10/cuda/CUDAStream.h

#78595 刚在同一批 compat 文件里单独补过 DeepEP 兼容相关处理,所以这次 rebase 并不会自动把 blocker 消掉;这批删除项还是需要基于新 base 单独重新核一遍语义和实际下游使用面。

所以我这边当前不会撤掉前面的 change request。等这轮新 CI 跑完、并把这版 diff 在 #78595 之后的语义重新收敛清楚,再继续往前推进会更稳妥。

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@SigureMo
Copy link
Copy Markdown
Member

https://github.com/PaddlePaddle/Paddle/actions/runs/24225463645/job/70726471802?pr=78631

ATen_resize_test 是不是 develop 直接挂了?另一个 PR 也有这个问题,develop 挂了需要尽快修一下,建议单独 PR 处理快速合入,不行就只能把单测禁了

@youge325
Copy link
Copy Markdown
Contributor Author

https://github.com/PaddlePaddle/Paddle/actions/runs/24225463645/job/70726471802?pr=78631

ATen_resize_test 是不是 develop 直接挂了?另一个 PR 也有这个问题,develop 挂了需要尽快修一下,建议单独 PR 处理快速合入,不行就只能把单测禁了

确实挂了,我看看能不能先合一个在 Mac-CPU 跳过测试的 pr

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants