Skip to content

Fix/im2col cpu#75731

Merged
lshpku merged 3 commits intoPaddlePaddle:developfrom
scyyh11:fix/im2col_cpu
Oct 11, 2025
Merged

Fix/im2col cpu#75731
lshpku merged 3 commits intoPaddlePaddle:developfrom
scyyh11:fix/im2col_cpu

Conversation

@scyyh11
Copy link
Copy Markdown
Contributor

@scyyh11 scyyh11 commented Oct 10, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

This PR fixes a critical SIGBUS / EXC_BAD_ACCESS crash in phi::funcs::im2col_common on macOS (Apple Silicon) when running convolutions on large tensors (e.g., PaddleOCR). The crash manifests at _platform_memset called from im2col_common, indicating an out-of-bounds write due to incorrect linear index computation.

Root cause: intermediate multiplications for col_idx/im_idx used 32-bit integers. With large dimensions, these products overflow, yielding invalid target pointers and causing OOB writes that reliably crash on macOS/ARM64.

Why the previous fix wasn’t enough: PR #75261 optimized im2col_common by reordering the logic to check bounds before computing/reading im_idx, which addressed one class of UB but did not widen the index arithmetic to 64-bit. Thus, on very large shapes, 32-bit overflow still occurs in the general path and can lead to OOB despite the reordered checks.

Relation to #75716: PR #75716 fixes unsafe memcpy over-reads in the specialized fast path (im2col_sh1sw1dh1dw1ph1pw1) by adding clamping/zero-fill and negative-size protections. That PR is complementary but does not cover the general im2col_common path where this crash occurs.

Fixes included in this PR:

  • Promote all relevant dimension variables and linear index arithmetic to 64-bit (int64_t) to eliminate overflow.
  • Preserve the “check bounds first, then compute/use index” sequencing introduced previously.

本 PR 修复了在 macOS(Apple Silicon)上,phi::funcs::im2col_common 在大尺寸卷积(如 PaddleOCR 场景)下出现的 SIGBUS / EXC_BAD_ACCESS 崩溃。崩溃点位于 im2col_common 调用的 _platform_memset,表明线性索引计算错误导致了越界写入。

根因: col_idx / im_idx 的中间乘法使用了 32 位整数,在大尺寸下发生溢出,生成错误的目标指针,进而写出界;在 macOS/ARM64 上会稳定复现崩溃。

为何之前的修复还不够: PR #75261im2col_common 做了“先做越界检查,再计算/读取 im_idx”的顺序优化,解决了其中一类未定义行为,但并未将索引运算提升为 64 位。因此在超大形状下,通用路径仍可能发生 32 位溢出,即使有边界检查也可能因为溢出导致目标地址错误,从而越界写。

#75716 的关系: PR #75716 修复了 快速路径im2col_sh1sw1dh1dw1ph1pw1)中的不安全 memcpy 越界读取问题(加入了大小裁剪、零填充与负数保护),该修复与本 PR 互补,但不覆盖本次崩溃所在的 通用 im2col_common 路径。

本 PR 的修复点:

  • 将相关维度与线性索引计算统一提升为 64 位 (int64_t),彻底避免溢出。
  • 延续“先检查边界,再计算/使用索引”的安全顺序。

Issue

- Add bounds clamping for all memcpy operations in the specialized fast path
- Add zero-fill for shortfall cases to ensure complete output tensor coverage
- Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary
…1dh1dw1ph1pw1

- Fix unsafe memcpy in NCHW path when filter_width == 1
- Prevent negative size_t conversion when output_width < plw + prw
- Clamp copy size to available source span (im_width) to avoid over-read
- Add zero-fill for shortfall cases to ensure complete output coverage
- Convert dimensions to 64-bit integers to avoid overflow during calculations
- Update index calculations for col and im arrays to use 64-bit arithmetic
- Ensure safe access to tensor data by checking bounds before indexing
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Oct 10, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@scyyh11
Copy link
Copy Markdown
Contributor Author

scyyh11 commented Oct 11, 2025

/re-run all-failed

@lshpku lshpku merged commit 91a4c15 into PaddlePaddle:develop Oct 11, 2025
84 of 86 checks passed
@scyyh11 scyyh11 deleted the fix/im2col_cpu branch October 13, 2025 10:14
SigureMo pushed a commit to cattidea/Paddle that referenced this pull request Oct 14, 2025
* fix: prevent memcpy over-read in im2col_sh1sw1dh1dw1ph1pw1 NCHW branches

- Add bounds clamping for all memcpy operations in the specialized fast path
- Add zero-fill for shortfall cases to ensure complete output tensor coverage
- Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary

* fix: prevent memcpy over-read in filter_width==1 case of im2col_sh1sw1dh1dw1ph1pw1

- Fix unsafe memcpy in NCHW path when filter_width == 1
- Prevent negative size_t conversion when output_width < plw + prw
- Clamp copy size to available source span (im_width) to avoid over-read
- Add zero-fill for shortfall cases to ensure complete output coverage

* fix: enhance im2col_common to prevent overflow in arithmetic operations

- Convert dimensions to 64-bit integers to avoid overflow during calculations
- Update index calculations for col and im arrays to use 64-bit arithmetic
- Ensure safe access to tensor data by checking bounds before indexing
@yuanlehome
Copy link
Copy Markdown
Contributor

Good job!

qingqing01 pushed a commit that referenced this pull request Oct 24, 2025
* fix: prevent memcpy over-read in im2col_sh1sw1dh1dw1ph1pw1 NCHW branches

- Add bounds clamping for all memcpy operations in the specialized fast path
- Add zero-fill for shortfall cases to ensure complete output tensor coverage
- Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary

* fix: prevent memcpy over-read in filter_width==1 case of im2col_sh1sw1dh1dw1ph1pw1

- Fix unsafe memcpy in NCHW path when filter_width == 1
- Prevent negative size_t conversion when output_width < plw + prw
- Clamp copy size to available source span (im_width) to avoid over-read
- Add zero-fill for shortfall cases to ensure complete output coverage

* fix: enhance im2col_common to prevent overflow in arithmetic operations

- Convert dimensions to 64-bit integers to avoid overflow during calculations
- Update index calculations for col and im arrays to use 64-bit arithmetic
- Ensure safe access to tensor data by checking bounds before indexing

Co-authored-by: Bvicii <98971614+scyyh11@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers HappyOpenSource 快乐开源活动issue与PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants