Skip to content

Fix out-of-bounds memory access in SetKernel for 0-size tensor#78486

Merged
wanghuancoder merged 6 commits intoPaddlePaddle:developfrom
hushenwei2000:fix/set-kernel-zero-size-oob
Apr 1, 2026
Merged

Fix out-of-bounds memory access in SetKernel for 0-size tensor#78486
wanghuancoder merged 6 commits intoPaddlePaddle:developfrom
hushenwei2000:fix/set-kernel-zero-size-oob

Conversation

@hushenwei2000
Copy link
Copy Markdown
Contributor

@hushenwei2000 hushenwei2000 commented Mar 25, 2026

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

修复 0-Size 报错问题

paddle.Tensor.set_ accuracy CPU 精度不对 paddle.Tensor.set_(Tensor([20],"complex64"), Tensor([0, 3],"complex64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float32"), Tensor([0, 3],"float32"), list[20,], list[2,], 0, )

Summary

  • Fix GPU out-of-bounds memory access (detected by compute-sanitizer) when calling Tensor.set_(source, shape, stride, offset) with a 0-size source tensor and non-zero target shape.
  • When source.numel() == 0, the output tensor now inherits the source's 0-size dims/strides instead of using user-specified shape/stride, preventing invalid memory access.
  • Updated existing TestSet_API_ZeroSize and added 5 new test cases covering various 0-size scenarios.

Root Cause

In SetKernel (paddle/phi/kernels/set_kernel.cc), the conditional logic for handling 0-size tensors had a missing branch: when source.numel() == 0 and x.numel() != 0, no branch was executed. The output tensor retained its original data holder, but SetInferMeta had already set its meta to the user-specified shape/stride (e.g., shape=[20], stride=[2]). When ContiguousKernel later attempted to read 20 elements via stride=2 (requiring storage for indices 0–38), the underlying storage was empty (nullptr), causing CUDA illegal memory access.

Before fix (missing branch):

if source.numel() != 0:    # False (source is 0-size)
    ...
elif x.numel() == 0:        # False (x has 20 elements)
    ...

Neither branch executes → out keeps stale holder with mismatched meta

Fix

When source.numel() == 0, force the output tensor's dims/strides to match the source's 0-size shape, and rebind the holder to the source's (empty) storage. This ensures numel == 0, so downstream kernels (e.g., ContiguousKernel) skip safely via their numel <= 0 early-return guards.

Test Plan

  • Verified fix with compute-sanitizer: ERROR SUMMARY: 0 errors (was 11 errors before fix)
    • paddle.Tensor.set_(Tensor([20],"float32"), Tensor([0, 3],"float32"), list[20,], list[2,], 0, )
  • Added 5 new test cases in TestSet_API_ZeroSize:
    • test_zero_size_source_with_nonzero_shape — 0-size source + explicit non-zero shape
    • test_zero_size_source_default_args — 0-size source with default shape/stride
    • test_zero_size_x_nonzero_source — 0-size x with non-zero source
    • test_both_zero_size — both x and source are 0-size
    • test_zero_size_source_no_crash_on_contiguous — no crash on .contiguous() after set_
  • APITest 中现有 set_ 全部测试
paddle.Tensor.set_(Tensor([20],"float64"), Tensor([0, 3],"float64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float64"), Tensor([15, 0],"float64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([3, 0],"float16"), Tensor([6, 0],"float16"), list[3,8,], list[2,2,], 0, )
paddle.Tensor.set_(Tensor([3, 8],"float16"), Tensor([0, 3],"float16"), list[3,8,], list[2,2,], 0, )
paddle.Tensor.set_(Tensor([3, 8],"float16"), Tensor([6, 0],"float16"), list[3,8,], list[2,2,], 0, )
paddle.Tensor.set_(Tensor([20],"complex64"), Tensor([0, 3],"complex64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float32"), Tensor([0, 3],"float32"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"complex64"), Tensor([15, 0],"complex64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float32"), Tensor([15, 0],"float32"), list[20,], list[2,], 0, )

是否引起精度变化

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 25, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 25, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@466ed80). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/kernels/set_kernel.cc 94.59% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #78486   +/-   ##
==========================================
  Coverage           ?   94.59%           
==========================================
  Files              ?        1           
  Lines              ?       37           
  Branches           ?        0           
==========================================
  Hits               ?       35           
  Misses             ?        2           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

4 similar comments
@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

hushenwei2000 and others added 4 commits March 27, 2026 14:09
When calling Tensor.set_(source, shape, stride, offset) with a 0-size source
tensor and non-zero target shape, the original code had a missing branch in
the conditional logic: when source.numel()==0 and x.numel()!=0, no branch
was executed, leaving `out` with its original data holder but with the
user-specified meta (shape/stride). This caused ContiguousKernel to read
beyond allocated memory when converting the strided tensor to contiguous.

The fix forces the output tensor to inherit the source's 0-size dims/strides
when source has no elements, preventing out-of-bounds access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hushenwei2000 hushenwei2000 force-pushed the fix/set-kernel-zero-size-oob branch from f021456 to 4b3ac5c Compare March 27, 2026 06:10
@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

1 similar comment
@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Copy link
Copy Markdown
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghuancoder wanghuancoder merged commit c9b4be5 into PaddlePaddle:develop Apr 1, 2026
131 of 137 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants