Skip to content

修复 compat pinned tensor 的 inplace fill 语义#78666

Closed
SigureMo wants to merge 3 commits intoPaddlePaddle:developfrom
cattidea:codex/fix-compat-pinned-kernel-data-ptr
Closed

修复 compat pinned tensor 的 inplace fill 语义#78666
SigureMo wants to merge 3 commits intoPaddlePaddle:developfrom
cattidea:codex/fix-compat-pinned-kernel-data-ptr

Conversation

@SigureMo
Copy link
Copy Markdown
Member

@SigureMo SigureMo commented Apr 13, 2026

背景

这个 PR 修复的是 compat 路径下 pin_memory tensor 的 inplace 写入语义。

之前的实现里,fill_ / zero_ 这类操作会把 gpu_pinned tensor 通过通用 dispatch 漂移成 gpu:0,导致 compat 行为和 torch 不一致。真实 torch 在 pin_memory tensor 上执行 fill_ 后仍然保持 pinned host tensor。

修复内容

这次改动分两层收敛:

  1. compat C++ 层
  • 删除之前的 mapped_pinned_tensor 路线,恢复 compat pinned tensor 的原生 raw pinned allocation / data_ptr() 语义。
  • empty / new_empty / empty_like 的 pinned 创建路径恢复为 CPU tensor + copy_to(pinned_place)
  • 为 compat fill_ / zero_ 增加 host pinned special-case:先在 CPU 上构造同 shape 的源 tensor,再 copy_ 回 pinned tensor,避免 place 漂移。
  1. Python 路径
  • torch proxy 最终拿到的是 paddle.Tensor,因此 quick_probe.py 里的 fill_ 走的是 Python paddle.Tensor.fill_,不是 C++ at::Tensor::fill_
  • python/paddle/tensor/manipulation.py 里的 fill_ / zero_ 增加同样的 host pinned special-case,保证 Python compat 路径也与 torch 对齐。

验证

已验证:

  • build/test/cpp/ATen_pin_memory_creation_test
  • build/test/cpp/compat/ATen_pin_memory_kernel_test
  • build/test/cpp/ATen_tensor_data_test
  • PYTHONPATH=build/python python test/legacy_test/test_tensor_fill_.py
  • PYTHONPATH=build/python python paddle_compat_repro/quick_probe.py

quick_probe.pyfill_ 之后的 tensor 现在保持为 Place(gpu_pinned)copy_ 路径也保持正常。

说明

numpy() 在 pinned tensor 上当前仍然是 copy-out 语义,没有在这个 PR 里处理;本次只收敛 fill_ / zero_ / pinned 创建与 kernel 直传这条问题链。

Return mapped device-visible pointers for compat CUDA-pinned tensors and add a regression test that exercises direct kernel writes into pin_memory tensors.

Co-authored-by: Codex <codex@openai.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 13, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@SigureMo SigureMo marked this pull request as ready for review April 13, 2026 12:59
Copilot AI review requested due to automatic review settings April 13, 2026 12:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes compat pinned-memory tensors so that data_ptr() can be passed directly into CUDA kernels by allocating mapped pinned storage in the compat path and returning a device-visible mapped alias for CUDA-pinned tensors. It also adds a CUDA regression test to validate kernel writes into a compat pinned tensor.

Changes:

  • Add compat utilities to allocate mapped pinned host memory and to resolve CUDA-pinned tensors to a kernel-visible pointer.
  • Route compat pinned tensor creation/copies through the mapped pinned allocation path.
  • Update compat data_ptr()/(const_)data_ptr<T>() to use the kernel-visible pointer and add a CUDA kernel regression test.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/cpp/compat/CMakeLists.txt Registers the new CUDA regression test in compat test CMake.
test/cpp/compat/ATen_pin_memory_kernel_test.cu Adds a CUDA kernel test that writes into a pinned (host) tensor via data_ptr().
paddle/phi/api/include/compat/utils/mapped_pinned_tensor.h Introduces mapped pinned allocation helpers and kernel-visible pointer resolution for CUDA/HIP pinned tensors.
paddle/phi/api/include/compat/ATen/ops/new_empty.h Uses mapped pinned allocation when pin_memory=true on new_empty.
paddle/phi/api/include/compat/ATen/ops/empty.h Uses mapped pinned allocation when pin_memory=true on empty.
paddle/phi/api/include/compat/ATen/ops/empty_like.h Uses mapped pinned allocation/copy helper when creating pinned empty_like.
paddle/phi/api/include/compat/ATen/core/TensorMethods.cpp Redirects typed data_ptr/const_data_ptr to the (now kernel-visible) data_ptr() implementation.
paddle/phi/api/include/compat/ATen/core/TensorBody.h Updates Tensor::{data_ptr,const_data_ptr,mutable_data_ptr} and pin_memory() to use mapped pinned helpers / kernel-visible pointer.
paddle/phi/api/include/compat/ATen/core/TensorBase.h Updates TensorBase::data_ptr() to return the kernel-visible pointer and adjusts its documentation accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to +147
inline void* _PD_GetKernelVisibleDataPtr(const paddle::Tensor& tensor) {
if (!tensor.defined()) {
return nullptr;
}

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
if (phi::is_cuda_pinned_place(tensor.place())) {
auto dense = std::dynamic_pointer_cast<phi::DenseTensor>(tensor.impl());
if (!dense) {
return const_cast<void*>(tensor.data());
}

auto holder = dense->Holder();
if (!holder || holder->ptr() == nullptr) {
return const_cast<void*>(tensor.data());
}

void* mapped_base = nullptr;
#ifdef PADDLE_WITH_HIP
auto err = hipHostGetDevicePointer(&mapped_base, holder->ptr(), 0);
if (err == hipSuccess && mapped_base != nullptr) {
return static_cast<char*>(mapped_base) + dense->meta().offset;
}
(void)hipGetLastError();
#elif defined(PADDLE_WITH_CUDA)
auto err = cudaHostGetDevicePointer(&mapped_base, holder->ptr(), 0);
if (err == cudaSuccess && mapped_base != nullptr) {
return static_cast<char*>(mapped_base) + dense->meta().offset;
}
(void)cudaGetLastError();
#endif
}
#endif

return const_cast<void*>(tensor.data());
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PD_GetKernelVisibleDataPtr() falls back to returning tensor.data() when cudaHostGetDevicePointer/hipHostGetDevicePointer fails. Since the default GPUPinned allocator uses cudaHostAllocPortable (no Mapped flag) (see paddle/phi/core/memory/allocation/pinned_allocator.cc:40-46), this failure path will be hit for pinned tensors created outside the new mapped allocation helpers, and data_ptr() will again return a non-device-visible host address (contradicting the new “pointer kernels should use” contract). Consider either (a) ensuring all compat pinned-tensor creation/copy paths use _PD_CreateMappedPinnedAllocation (or another mapped/registered strategy), or (b) making this function throw/explicitly error when it cannot obtain a device-visible alias for a CUDA-pinned tensor, to avoid silently returning an unsafe pointer for kernels.

Copilot uses AI. Check for mistakes.
Replace the internal-only DenseTensor::memory_size() call in the compat mapped pinned helper with a header-visible byte count computation so downstream extensions that include compat headers keep compiling.

Co-authored-by: Codex <codex@openai.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@d586e1a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...e/phi/api/include/compat/utils/pinned_tensor_ops.h 81.81% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #78666   +/-   ##
==========================================
  Coverage           ?   93.10%           
==========================================
  Files              ?        4           
  Lines              ?       29           
  Branches           ?        0           
==========================================
  Hits               ?       27           
  Misses             ?        2           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Codex <codex@openai.com>
@SigureMo SigureMo changed the title [codex] compat: fix pinned tensor kernel data_ptr 修复 compat pinned tensor 的 inplace fill 语义 Apr 14, 2026
@SigureMo SigureMo closed this Apr 15, 2026
@SigureMo SigureMo deleted the codex/fix-compat-pinned-kernel-data-ptr branch April 15, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants