feat(HIP): register bfloat16 kernels for conv2d/conv3d/depthwise_conv2d on ROCm#78587
Merged
luotao1 merged 3 commits intoPaddlePaddle:developfrom Apr 13, 2026
Merged
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
fchange
added a commit
to fchange/PaddleX
that referenced
this pull request
Apr 4, 2026
Remove _keep_in_fp32_modules = ["visual", "mlp_AR"] from PaddleOCRVLForConditionalGeneration. This workaround was added to avoid MIOpen BF16 convolution bugs on ROCm 7.0 by forcing the visual encoder to FP32, which doubled VRAM usage and reduced throughput. The Paddle framework now registers BF16 conv kernels for HIP backend, making this workaround unnecessary. See: PaddlePaddle/Paddle#78587 Signed-off-by: fchange Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…2d on HIP Add phi::bfloat16 to PD_REGISTER_KERNEL macros for HIP (ROCm) backend: - conv2d, conv3d, depthwise_conv2d (forward) - conv2d_grad, conv3d_grad (backward) - conv2d_double_grad, conv3d_double_grad, depthwise_conv2d_double_grad This enables BF16 precision inference for vision encoders (e.g., SigLIP in PaddleOCR-VL) on AMD GPUs. Previously only float and float16 were registered for HIP, causing RuntimeError when BF16 models attempted convolution operations. Also adds test_hip_bf16_conv_kernel.py to verify BF16 conv kernel registration on HIP/ROCm platforms. Fixes: conv2d BF16 kernel not registered on HIP Signed-off-by: fchange Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
97f8fe6 to
39fb3a2
Compare
Contributor
The ToCudnnDataType function in miopen_desc.h was missing a case for DataType::BFLOAT16, causing it to fall through and return miopenFloat (FP32) instead of miopenBFloat16 for BF16 tensors. This led to MIOPEN using FP32 tensor descriptors for BF16 data, which produced NaN output in conv2d/conv3d operations and caused the DCU CI test_hip_bf16_conv_kernel to fail. Fixes: test_hip_bf16_conv_kernel failure on Linux-DCU CI.
Contributor
Author
Contributor
|
hi, @fchange
|
This was referenced Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


PR Category
Operator Mechanism
PR Types
Bug fixes
Description
This PR registers
phi::bfloat16convolution kernels for the HIP/ROCm backend so BF16 vision models can run on AMD GPUs without hittingkernel not registeredfor convolution ops.Changes in this PR:
phi::bfloat16to HIP kernel registration forconv2d,conv3d, anddepthwise_conv2dinpaddle/phi/kernels/gpudnn/conv_kernel.cuphi::bfloat16to HIP kernel registration forconv2d_grad,conv3d_grad,conv2d_double_grad,conv3d_double_grad, anddepthwise_conv2d_double_gradinpaddle/phi/kernels/gpudnn/conv_grad_kernel.cuFollow-up adjustment:
test_hip_bf16_conv_kernel.pyto minimalconv2dand grouped-conv smoke testsManual verification used for the original change:
kernel conv2d not registeredfor BF16 on HIP是否引起精度变化
否