Fix/im2col cpu#75731
Merged
lshpku merged 3 commits intoPaddlePaddle:developfrom Oct 11, 2025
Merged
Conversation
- Add bounds clamping for all memcpy operations in the specialized fast path - Add zero-fill for shortfall cases to ensure complete output tensor coverage - Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary
…1dh1dw1ph1pw1 - Fix unsafe memcpy in NCHW path when filter_width == 1 - Prevent negative size_t conversion when output_width < plw + prw - Clamp copy size to available source span (im_width) to avoid over-read - Add zero-fill for shortfall cases to ensure complete output coverage
- Convert dimensions to 64-bit integers to avoid overflow during calculations - Update index calculations for col and im arrays to use 64-bit arithmetic - Ensure safe access to tensor data by checking bounds before indexing
|
你的PR提交成功,感谢你对开源项目的贡献! |
3 tasks
lshpku
approved these changes
Oct 11, 2025
Contributor
Author
|
/re-run all-failed |
SigureMo
pushed a commit
to cattidea/Paddle
that referenced
this pull request
Oct 14, 2025
* fix: prevent memcpy over-read in im2col_sh1sw1dh1dw1ph1pw1 NCHW branches - Add bounds clamping for all memcpy operations in the specialized fast path - Add zero-fill for shortfall cases to ensure complete output tensor coverage - Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary * fix: prevent memcpy over-read in filter_width==1 case of im2col_sh1sw1dh1dw1ph1pw1 - Fix unsafe memcpy in NCHW path when filter_width == 1 - Prevent negative size_t conversion when output_width < plw + prw - Clamp copy size to available source span (im_width) to avoid over-read - Add zero-fill for shortfall cases to ensure complete output coverage * fix: enhance im2col_common to prevent overflow in arithmetic operations - Convert dimensions to 64-bit integers to avoid overflow during calculations - Update index calculations for col and im arrays to use 64-bit arithmetic - Ensure safe access to tensor data by checking bounds before indexing
Contributor
|
Good job! |
This was referenced Oct 16, 2025
Closed
qingqing01
pushed a commit
that referenced
this pull request
Oct 24, 2025
* fix: prevent memcpy over-read in im2col_sh1sw1dh1dw1ph1pw1 NCHW branches - Add bounds clamping for all memcpy operations in the specialized fast path - Add zero-fill for shortfall cases to ensure complete output tensor coverage - Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary * fix: prevent memcpy over-read in filter_width==1 case of im2col_sh1sw1dh1dw1ph1pw1 - Fix unsafe memcpy in NCHW path when filter_width == 1 - Prevent negative size_t conversion when output_width < plw + prw - Clamp copy size to available source span (im_width) to avoid over-read - Add zero-fill for shortfall cases to ensure complete output coverage * fix: enhance im2col_common to prevent overflow in arithmetic operations - Convert dimensions to 64-bit integers to avoid overflow during calculations - Update index calculations for col and im arrays to use 64-bit arithmetic - Ensure safe access to tensor data by checking bounds before indexing Co-authored-by: Bvicii <98971614+scyyh11@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
This PR fixes a critical SIGBUS / EXC_BAD_ACCESS crash in
phi::funcs::im2col_commonon macOS (Apple Silicon) when running convolutions on large tensors (e.g., PaddleOCR). The crash manifests at_platform_memsetcalled fromim2col_common, indicating an out-of-bounds write due to incorrect linear index computation.Root cause: intermediate multiplications for
col_idx/im_idxused 32-bit integers. With large dimensions, these products overflow, yielding invalid target pointers and causing OOB writes that reliably crash on macOS/ARM64.Why the previous fix wasn’t enough: PR #75261 optimized
im2col_commonby reordering the logic to check bounds before computing/readingim_idx, which addressed one class of UB but did not widen the index arithmetic to 64-bit. Thus, on very large shapes, 32-bit overflow still occurs in the general path and can lead to OOB despite the reordered checks.Relation to #75716: PR #75716 fixes unsafe
memcpyover-reads in the specialized fast path (im2col_sh1sw1dh1dw1ph1pw1) by adding clamping/zero-fill and negative-size protections. That PR is complementary but does not cover the generalim2col_commonpath where this crash occurs.Fixes included in this PR:
int64_t) to eliminate overflow.本 PR 修复了在 macOS(Apple Silicon)上,
phi::funcs::im2col_common在大尺寸卷积(如 PaddleOCR 场景)下出现的 SIGBUS / EXC_BAD_ACCESS 崩溃。崩溃点位于im2col_common调用的_platform_memset,表明线性索引计算错误导致了越界写入。根因:
col_idx/im_idx的中间乘法使用了 32 位整数,在大尺寸下发生溢出,生成错误的目标指针,进而写出界;在 macOS/ARM64 上会稳定复现崩溃。为何之前的修复还不够: PR #75261 对
im2col_common做了“先做越界检查,再计算/读取im_idx”的顺序优化,解决了其中一类未定义行为,但并未将索引运算提升为 64 位。因此在超大形状下,通用路径仍可能发生 32 位溢出,即使有边界检查也可能因为溢出导致目标地址错误,从而越界写。与 #75716 的关系: PR #75716 修复了 快速路径(
im2col_sh1sw1dh1dw1ph1pw1)中的不安全memcpy越界读取问题(加入了大小裁剪、零填充与负数保护),该修复与本 PR 互补,但不覆盖本次崩溃所在的 通用im2col_common路径。本 PR 的修复点:
int64_t),彻底避免溢出。Issue
Segmentation faultis detected by the operating system PaddleOCR#16609