feat(HIP): register bfloat16 kernels for conv2d/conv3d/depthwise_conv2d on ROCm by fchange · Pull Request #78587 · PaddlePaddle/Paddle

fchange · 2026-04-04T08:31:52Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

This PR registers phi::bfloat16 convolution kernels for the HIP/ROCm backend so BF16 vision models can run on AMD GPUs without hitting kernel not registered for convolution ops.

Changes in this PR:

Add phi::bfloat16 to HIP kernel registration for conv2d, conv3d, and depthwise_conv2d in paddle/phi/kernels/gpudnn/conv_kernel.cu
Add phi::bfloat16 to HIP kernel registration for conv2d_grad, conv3d_grad, conv2d_double_grad, conv3d_double_grad, and depthwise_conv2d_double_grad in paddle/phi/kernels/gpudnn/conv_grad_kernel.cu
Add a HIP BF16 regression test focused on minimal convolution smoke coverage

Follow-up adjustment:

Narrow test_hip_bf16_conv_kernel.py to minimal conv2d and grouped-conv smoke tests
Remove larger BF16 operator-chain coverage from this PR so Linux-DCU validates kernel registration only and does not gate on unrelated BF16 op behavior

Manual verification used for the original change:

Reproduced the pre-fix error: kernel conv2d not registered for BF16 on HIP
Verified the patched Paddle build allows BF16 convolution to execute on AMD GPU
Verified PaddleX PaddleOCR-VL native backend inference can proceed past the BF16 convolution registration failure on AMD GPU

是否引起精度变化

否

paddle-bot · 2026-04-04T08:31:57Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2026-04-04T08:31:58Z

All committers have signed the CLA.

Remove _keep_in_fp32_modules = ["visual", "mlp_AR"] from PaddleOCRVLForConditionalGeneration. This workaround was added to avoid MIOpen BF16 convolution bugs on ROCm 7.0 by forcing the visual encoder to FP32, which doubled VRAM usage and reduced throughput. The Paddle framework now registers BF16 conv kernels for HIP backend, making this workaround unnecessary. See: PaddlePaddle/Paddle#78587 Signed-off-by: fchange Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

yongqiangma

LGTM

…2d on HIP Add phi::bfloat16 to PD_REGISTER_KERNEL macros for HIP (ROCm) backend: - conv2d, conv3d, depthwise_conv2d (forward) - conv2d_grad, conv3d_grad (backward) - conv2d_double_grad, conv3d_double_grad, depthwise_conv2d_double_grad This enables BF16 precision inference for vision encoders (e.g., SigLIP in PaddleOCR-VL) on AMD GPUs. Previously only float and float16 were registered for HIP, causing RuntimeError when BF16 models attempted convolution operations. Also adds test_hip_bf16_conv_kernel.py to verify BF16 conv kernel registration on HIP/ROCm platforms. Fixes: conv2d BF16 kernel not registered on HIP Signed-off-by: fchange Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

luotao1 · 2026-04-09T02:39:39Z

其他的CI不需要关注，DCU的test需要修复

The ToCudnnDataType function in miopen_desc.h was missing a case for DataType::BFLOAT16, causing it to fall through and return miopenFloat (FP32) instead of miopenBFloat16 for BF16 tensors. This led to MIOPEN using FP32 tensor descriptors for BF16 data, which produced NaN output in conv2d/conv3d operations and caused the DCU CI test_hip_bf16_conv_kernel to fail. Fixes: test_hip_bf16_conv_kernel failure on Linux-DCU CI.

fchange · 2026-04-11T14:59:56Z

其他的CI不需要关注，DCU的test需要修复

已经完成啦

yongqiangma

LGTM

luotao1 · 2026-04-13T07:12:50Z

hi, @fchange

非常感谢你对飞桨的贡献，我们正在运营一个PFCC组织。PFCC是飞桨开源的贡献者俱乐部，只有给飞桨合入过代码的开发者才能加入，俱乐部里每两周会有一次例会（按兴趣参加），也会时不时办线下meetup面基，详情可见 https://github.com/luotao1 主页说明。
如果你对PFCC有兴趣，请发送邮件至 ext_paddle_oss@baidu.com，我们会邀请你加入~

paddle-bot bot added the contributor External developers label Apr 4, 2026

This was referenced Apr 4, 2026

PaddleOCR-VL: Remove ROCm BF16 _keep_in_fp32_modules workaround PaddlePaddle/PaddleX#5076

Open

fix(doc_vlm): remove ROCm BF16 _keep_in_fp32_modules workaround in PaddleOCR-VL PaddlePaddle/PaddleX#5077

Open

yongqiangma previously approved these changes Apr 7, 2026

View reviewed changes

fxyfxy777 mentioned this pull request Apr 7, 2026

HIP/ROCm: conv2d/conv3d/depthwise_conv2d kernels missing bfloat16 registration #78586

Closed

fchange dismissed yongqiangma’s stale review via 39fb3a2 April 7, 2026 06:26

fchange force-pushed the feat/hip-bf16-conv-kernel branch from 97f8fe6 to 39fb3a2 Compare April 7, 2026 06:26

fchange requested a review from yongqiangma April 7, 2026 06:34

luotao1 added the PaddlePaddle Hackathon label Apr 7, 2026

luotao1 assigned luotao1 and yongqiangma Apr 7, 2026

test: narrow HIP BF16 conv smoke coverage

df9ccf0

luotao1 closed this Apr 13, 2026

luotao1 reopened this Apr 13, 2026

yongqiangma approved these changes Apr 13, 2026

View reviewed changes

luotao1 added the skip-ci: approval label Apr 13, 2026

luotao1 merged commit 7ead32d into PaddlePaddle:develop Apr 13, 2026
114 of 116 checks passed

fchange mentioned this pull request Apr 13, 2026

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm ROCm/Paddle#47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(HIP): register bfloat16 kernels for conv2d/conv3d/depthwise_conv2d on ROCm#78587

feat(HIP): register bfloat16 kernels for conv2d/conv3d/depthwise_conv2d on ROCm#78587
luotao1 merged 3 commits intoPaddlePaddle:developfrom
fchange:feat/hip-bf16-conv-kernel

fchange commented Apr 4, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 4, 2026

Uh oh!

CLAassistant commented Apr 4, 2026 •

edited

Loading

Uh oh!

yongqiangma left a comment

Uh oh!

luotao1 commented Apr 9, 2026

Uh oh!

fchange commented Apr 11, 2026

Uh oh!

yongqiangma left a comment

Uh oh!

Uh oh!

luotao1 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fchange commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

是否引起精度变化

Uh oh!

paddle-bot bot commented Apr 4, 2026

Uh oh!

CLAassistant commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yongqiangma left a comment

Choose a reason for hiding this comment

Uh oh!

luotao1 commented Apr 9, 2026

Uh oh!

fchange commented Apr 11, 2026

Uh oh!

yongqiangma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luotao1 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fchange commented Apr 4, 2026 •

edited

Loading

CLAassistant commented Apr 4, 2026 •

edited

Loading