修复 compat pinned tensor 的 inplace fill 语义 by SigureMo · Pull Request #78666 · PaddlePaddle/Paddle

SigureMo · 2026-04-13T12:57:52Z

背景

这个 PR 修复的是 compat 路径下 pin_memory tensor 的 inplace 写入语义。

之前的实现里，fill_ / zero_ 这类操作会把 gpu_pinned tensor 通过通用 dispatch 漂移成 gpu:0，导致 compat 行为和 torch 不一致。真实 torch 在 pin_memory tensor 上执行 fill_ 后仍然保持 pinned host tensor。

修复内容

这次改动分两层收敛：

compat C++ 层

删除之前的 mapped_pinned_tensor 路线，恢复 compat pinned tensor 的原生 raw pinned allocation / data_ptr() 语义。
empty / new_empty / empty_like 的 pinned 创建路径恢复为 CPU tensor + copy_to(pinned_place)。
为 compat fill_ / zero_ 增加 host pinned special-case：先在 CPU 上构造同 shape 的源 tensor，再 copy_ 回 pinned tensor，避免 place 漂移。

Python 路径

torch proxy 最终拿到的是 paddle.Tensor，因此 quick_probe.py 里的 fill_ 走的是 Python paddle.Tensor.fill_，不是 C++ at::Tensor::fill_。
为 python/paddle/tensor/manipulation.py 里的 fill_ / zero_ 增加同样的 host pinned special-case，保证 Python compat 路径也与 torch 对齐。

验证

已验证：

build/test/cpp/ATen_pin_memory_creation_test
build/test/cpp/compat/ATen_pin_memory_kernel_test
build/test/cpp/ATen_tensor_data_test
PYTHONPATH=build/python python test/legacy_test/test_tensor_fill_.py
PYTHONPATH=build/python python paddle_compat_repro/quick_probe.py

quick_probe.py 中 fill_ 之后的 tensor 现在保持为 Place(gpu_pinned)，copy_ 路径也保持正常。

说明

numpy() 在 pinned tensor 上当前仍然是 copy-out 语义，没有在这个 PR 里处理；本次只收敛 fill_ / zero_ / pinned 创建与 kernel 直传这条问题链。

Return mapped device-visible pointers for compat CUDA-pinned tensors and add a regression test that exercises direct kernel writes into pin_memory tensors. Co-authored-by: Codex <codex@openai.com>

paddle-bot · 2026-04-13T12:58:01Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copilot

Pull request overview

This PR fixes compat pinned-memory tensors so that data_ptr() can be passed directly into CUDA kernels by allocating mapped pinned storage in the compat path and returning a device-visible mapped alias for CUDA-pinned tensors. It also adds a CUDA regression test to validate kernel writes into a compat pinned tensor.

Changes:

Add compat utilities to allocate mapped pinned host memory and to resolve CUDA-pinned tensors to a kernel-visible pointer.
Route compat pinned tensor creation/copies through the mapped pinned allocation path.
Update compat data_ptr()/(const_)data_ptr<T>() to use the kernel-visible pointer and add a CUDA kernel regression test.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
test/cpp/compat/CMakeLists.txt	Registers the new CUDA regression test in compat test CMake.
test/cpp/compat/ATen_pin_memory_kernel_test.cu	Adds a CUDA kernel test that writes into a pinned (host) tensor via `data_ptr()`.
paddle/phi/api/include/compat/utils/mapped_pinned_tensor.h	Introduces mapped pinned allocation helpers and kernel-visible pointer resolution for CUDA/HIP pinned tensors.
paddle/phi/api/include/compat/ATen/ops/new_empty.h	Uses mapped pinned allocation when `pin_memory=true` on `new_empty`.
paddle/phi/api/include/compat/ATen/ops/empty.h	Uses mapped pinned allocation when `pin_memory=true` on `empty`.
paddle/phi/api/include/compat/ATen/ops/empty_like.h	Uses mapped pinned allocation/copy helper when creating pinned `empty_like`.
paddle/phi/api/include/compat/ATen/core/TensorMethods.cpp	Redirects typed `data_ptr`/`const_data_ptr` to the (now kernel-visible) `data_ptr()` implementation.
paddle/phi/api/include/compat/ATen/core/TensorBody.h	Updates `Tensor::{data_ptr,const_data_ptr,mutable_data_ptr}` and `pin_memory()` to use mapped pinned helpers / kernel-visible pointer.
paddle/phi/api/include/compat/ATen/core/TensorBase.h	Updates `TensorBase::data_ptr()` to return the kernel-visible pointer and adjusts its documentation accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T13:06:32Z

+inline void* _PD_GetKernelVisibleDataPtr(const paddle::Tensor& tensor) {
+  if (!tensor.defined()) {
+    return nullptr;
+  }
+
+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
+  if (phi::is_cuda_pinned_place(tensor.place())) {
+    auto dense = std::dynamic_pointer_cast<phi::DenseTensor>(tensor.impl());
+    if (!dense) {
+      return const_cast<void*>(tensor.data());
+    }
+
+    auto holder = dense->Holder();
+    if (!holder || holder->ptr() == nullptr) {
+      return const_cast<void*>(tensor.data());
+    }
+
+    void* mapped_base = nullptr;
+#ifdef PADDLE_WITH_HIP
+    auto err = hipHostGetDevicePointer(&mapped_base, holder->ptr(), 0);
+    if (err == hipSuccess && mapped_base != nullptr) {
+      return static_cast<char*>(mapped_base) + dense->meta().offset;
+    }
+    (void)hipGetLastError();
+#elif defined(PADDLE_WITH_CUDA)
+    auto err = cudaHostGetDevicePointer(&mapped_base, holder->ptr(), 0);
+    if (err == cudaSuccess && mapped_base != nullptr) {
+      return static_cast<char*>(mapped_base) + dense->meta().offset;
+    }
+    (void)cudaGetLastError();
+#endif
+  }
+#endif
+
+  return const_cast<void*>(tensor.data());
+}


_PD_GetKernelVisibleDataPtr() falls back to returning tensor.data() when cudaHostGetDevicePointer/hipHostGetDevicePointer fails. Since the default GPUPinned allocator uses cudaHostAllocPortable (no Mapped flag) (see paddle/phi/core/memory/allocation/pinned_allocator.cc:40-46), this failure path will be hit for pinned tensors created outside the new mapped allocation helpers, and data_ptr() will again return a non-device-visible host address (contradicting the new “pointer kernels should use” contract). Consider either (a) ensuring all compat pinned-tensor creation/copy paths use _PD_CreateMappedPinnedAllocation (or another mapped/registered strategy), or (b) making this function throw/explicitly error when it cannot obtain a device-visible alias for a CUDA-pinned tensor, to avoid silently returning an unsafe pointer for kernels.

Replace the internal-only DenseTensor::memory_size() call in the compat mapped pinned helper with a header-visible byte count computation so downstream extensions that include compat headers keep compiling. Co-authored-by: Codex <codex@openai.com>

codecov-commenter · 2026-04-13T21:50:22Z

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@d586e1a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...e/phi/api/include/compat/utils/pinned_tensor_ops.h	81.81%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop   #78666   +/-   ##
==========================================
  Coverage           ?   93.10%           
==========================================
  Files              ?        4           
  Lines              ?       29           
  Branches           ?        0           
==========================================
  Hits               ?       27           
  Misses             ?        2           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Codex <codex@openai.com>

compat: fix pinned tensor kernel data_ptr

aaf06b7

Return mapped device-visible pointers for compat CUDA-pinned tensors and add a regression test that exercises direct kernel writes into pin_memory tensors. Co-authored-by: Codex <codex@openai.com>

SigureMo marked this pull request as ready for review April 13, 2026 12:59

Copilot AI review requested due to automatic review settings April 13, 2026 12:59

Copilot started reviewing on behalf of SigureMo April 13, 2026 13:00 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

compat: fix pinned copy header usage

3751716

Replace the internal-only DenseTensor::memory_size() call in the compat mapped pinned helper with a header-visible byte count computation so downstream extensions that include compat headers keep compiling. Co-authored-by: Codex <codex@openai.com>

Fix compat pinned fill semantics

861991b

Co-authored-by: Codex <codex@openai.com>

SigureMo changed the title ~~[codex] compat: fix pinned tensor kernel data_ptr~~ 修复 compat pinned tensor 的 inplace fill 语义 Apr 14, 2026

SigureMo closed this Apr 15, 2026

SigureMo deleted the codex/fix-compat-pinned-kernel-data-ptr branch April 15, 2026 03:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

修复 compat pinned tensor 的 inplace fill 语义#78666

修复 compat pinned tensor 的 inplace fill 语义#78666
SigureMo wants to merge 3 commits intoPaddlePaddle:developfrom
cattidea:codex/fix-compat-pinned-kernel-data-ptr

SigureMo commented Apr 13, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

codecov-commenter commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SigureMo commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

背景

修复内容

验证

说明

Uh oh!

paddle-bot bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SigureMo commented Apr 13, 2026 •

edited

Loading

codecov-commenter commented Apr 13, 2026 •

edited

Loading