[Fix] fix IndexElementwiseGet kernel CUDA error(700) on 0-size input#78251
Closed
DanielSun11 wants to merge 4 commits intoPaddlePaddle:developfrom
Closed
[Fix] fix IndexElementwiseGet kernel CUDA error(700) on 0-size input#78251DanielSun11 wants to merge 4 commits intoPaddlePaddle:developfrom
DanielSun11 wants to merge 4 commits intoPaddlePaddle:developfrom
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (25.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #78251 +/- ##
==========================================
Coverage ? 25.00%
==========================================
Files ? 2
Lines ? 4
Branches ? 0
==========================================
Hits ? 1
Misses ? 3
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add x_grad->numel() == 0 early-return in CPU backward kernel, analogous to the GPU backward kernel fix. When x.numel()==0, Alloc<T> returns a null pointer; using EigenVector::Flatten on it causes SIGSEGV.
…DanielSun11/Paddle into fix/index-elementwise-get-0size
Contributor
Author
|
#78453 已有类似解决 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Types
Operator Mechanism
PR Category
Bug fixes
Description
问题背景
当使用高级整数索引(list-of-list)对第一维为 0 的 Tensor 执行
__getitem__时,触发 CUDA error(700)(非法内存访问)或 CPU segfault:根因分析
调用链:
__getitem__→tensor__getitem_dygraph→ApplyGetitem→AdvancedIndex→index_elementwise_get_ad_func→IndexElementwiseGetKernelAdvancedIndex构造函数将被索引的维度用索引形状替换得到src_sizes,如对x.shape=[0,5,4,3]用[[2,3,4],[1,2,5]](shape=[2,3])索引维度 0,得到src_sizes = [2, 3, 5, 4, 3](numel=90)。Forward kernel(GPU):
out->numel() = 90 != 0,原有if (out->numel() == 0) return;不触发;x.numel() = 0,x.data<T>()返回nullptr;kernel 访问nullptr + offset→ CUDA error(700)Forward kernel(CPU):同上,CPU 实现也缺少此检查
Backward kernel(GPU):
x_grad->numel() = 0,GpuMemsetAsync填零后继续执行,访问空指针Backward kernel(CPU):
x_grad->numel() = 0,dev_ctx.Alloc<T>(x_grad)返回空指针,EigenVector<T>::Flatten(*x_grad)对空指针操作 → SIGSEGV修复方案
在四个 kernel 文件中增加对输入为空的早退检查:
index_elementwise_get_kernel.cu):当x.numel() == 0时,用GpuMemsetAsync将输出填零并 returnindex_elementwise_get_kernel.cc):当x.numel() == 0时,用memset将输出填零并 returnindex_elementwise_get_grad_kernel.cu):当x_grad->numel() == 0时直接 returnindex_elementwise_get_grad_kernel.cc):在Alloc<T>之后、Eigen 操作之前加if (x_grad->numel() == 0) return;新增单测
test/legacy_test/test_index_elementwise.py:新增TestIndexElementwiseGet0SizeInput,覆盖 complex128、bool、float32、float64、int64、float16 等 dtype,包含正负索引及一维索引等场景(9 个测试方法)test/legacy_test/test_index_elementwise_grad.py:新增TestIndexElementwiseGet0SizeInputGrad,覆盖 float32、float64 及负索引的反向场景(3 个测试方法)所有 32 个新增测试方法均通过(CPU + GPU)。
是否引起精度变化
否