Skip to content

[API Compatibility] Implement paddle.addcmul API -part#77333

Open
Manfredss wants to merge 34 commits intoPaddlePaddle:developfrom
Manfredss:ApiEnhance354
Open

[API Compatibility] Implement paddle.addcmul API -part#77333
Manfredss wants to merge 34 commits intoPaddlePaddle:developfrom
Manfredss:ApiEnhance354

Conversation

@Manfredss
Copy link
Copy Markdown
Contributor

@Manfredss Manfredss commented Jan 13, 2026

PR Category

User Experience

PR Types

New features

Description

This PR implements the addcmul operator for PaddlePaddle, which performs element-wise multiplication of two tensors, multiplies the result by a scalar value, and adds it to an input tensor.

Formula: output = input + value * tensor1 * tensor2

This operator provides users with a convenient operation for combined multiply-add computations.

Implementation Details

Core Components

  1. C++ Kernels (paddle/phi/kernels/)

    • Forward kernels: addcmul_kernel.h and implementations for CPU/GPU
    • Backward kernels: addcmul_grad_kernel.h and implementations for CPU/GPU
    • Implementation files in impl/ directory with templated functions for different ranks (0-6D)
  2. Operator Configuration (paddle/phi/ops/yaml/)

    • Added addcmul operator definition in ops.yaml
    • Added addcmul_grad backward operator in backward.yaml
    • Configured with proper infer_meta and kernel functions
  3. Shape Inference (paddle/phi/infermeta/)

    • Implemented AddcmulInferMeta in ternary.cc/h
    • Handles broadcasting for three input tensors
    • Validates dimension compatibility
  4. PIR Support (paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/)

    • Added AddcmulOpInferSymbolicShape for new IR system
    • Handles symbolic shape inference with broadcasting
  5. Python API (python/paddle/tensor/)

    • Added paddle.addcmul() function in math.py
    • Registered Tensor.addcmul() method in __init__.py
    • Supports both dynamic and static graph modes
  6. Testing (test/legacy_test/)

    • Comprehensive test suite with 52 test cases
    • Tests multiple data types: float16, float32, float64, bfloat16
    • Tests various tensor shapes and broadcasting scenarios
    • Tests gradient computation for all inputs
    • Tests zero-size tensors and error conditions
    • Tests both OpTest framework and high-level API
  7. Configuration (test/white_list/)

    • Added addcmul to FP64 gradient threshold whitelist

Features

  • Multi-device support: CPU and GPU (CUDA)
  • Multiple data types: float16, float32, float64, bfloat16
  • Broadcasting: Full NumPy-style broadcasting support
  • Gradient support: Automatic differentiation for all three inputs
  • Tensor dimensions: Supports 0D to 6D tensors
  • API compatibility: Similar interface to PyTorch's torch.addcmul
  • Zero-size tensors: Properly handles edge cases

Testing Results

All 52 tests pass successfully:

(paddle) D:\Xue\ML\Paddle\PaddleDebug>python test/legacy_test/test_addcmul.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0113 14:12:09.673414 20628 gpu_resources.cc:116] Please NOTE: device: 0, GPU Compute Capability: 12.0, Driver API Version: 13.1, Runtime API Version: 12.9
....I0113 14:12:09.695261 20628 pir_interpreter.cc:1529] New Executor is Running ...
I0113 14:12:09.695261 20628 pir_interpreter.cc:1552] pir interpreter is running by multi-thread mode ...
..I0113 14:12:09.702877 20628 program_interpreter.cc:255] New Executor is Running.
I0113 14:12:09.704878 20628 interpreter_util.cc:624] Standalone Executor is Used.
W0113 14:12:09.744876 20628 eager_utils.cc:3584] Paddle static graph(PIR) not support input out tensor for now!!!!!
C:\Users\***\anaconda3\envs\paddle\Lib\site-packages\paddle\pir\math_op_patch.py:241: UserWarning: Tensor do not have 'place' interface for pir graph mode, try not to use it. None will be returned.
  warnings.warn(
..............................................
----------------------------------------------------------------------
Ran 52 tests in 16.889s

OK

Test coverage includes:

  • Basic functionality with various shapes (1D, 2D, 3D, large tensors)
  • Different value parameters (positive, negative, default)
  • Multiple data types (FP16, FP32, FP64, BF16)
  • Broadcasting scenarios
  • Gradient checks for all inputs
  • Zero-size tensor edge cases
  • Error handling for invalid inputs
  • Both static and dynamic graph modes
  • Tensor method (tensor.addcmul())

API Examples

Dynamic Graph Mode

import paddle

input = paddle.ones([2, 2])
tensor1 = paddle.ones([2, 2]) * 2
tensor2 = paddle.ones([2, 2]) * 3

# Using function API
out = paddle.addcmul(input, tensor1, tensor2, value=0.5)
# Result: [[4., 4.], [4., 4.]]

# Using tensor method
out = input.addcmul(tensor1, tensor2, value=0.5)

Static Graph Mode

import paddle

paddle.enable_static()
input = paddle.static.data('input', shape=[2, 2], dtype='float32')
tensor1 = paddle.static.data('tensor1', shape=[2, 2], dtype='float32')
tensor2 = paddle.static.data('tensor2', shape=[2, 2], dtype='float32')
out = paddle.addcmul(input, tensor1, tensor2, value=0.5)

Broadcasting

input = paddle.ones([3, 4])
tensor1 = paddle.randn([1, 4])
tensor2 = paddle.randn([3, 1])
out = paddle.addcmul(input, tensor1, tensor2, value=2.0)

Backward Compatibility

This PR adds new functionality without modifying existing APIs or behaviors. It is fully backward compatible.

Checklist

  • Implemented forward and backward kernels
  • Added operator YAML configurations
  • Implemented shape inference (InferMeta)
  • Added PIR symbolic shape inference
  • Created Python API wrapper
  • Registered tensor method
  • Added comprehensive test suite
  • All tests passing (52/52)
  • Added to gradient threshold whitelist
  • Code follows PaddlePaddle style guidelines
  • All comments in English
  • No linter errors

Related Issues

【启航计划】PaddlePaddle API兼容性增强 No.354

Additional Notes

  • The operator uses Eigen for efficient computation with automatic vectorization
  • Mixed precision computation is handled via MPTypeTrait for numerical stability
  • Broadcasting follows NumPy semantics
  • Gradient computation is mathematically verified and tested

Files Changed

New Files (9):

paddle/phi/kernels/addcmul_kernel.h
paddle/phi/kernels/addcmul_grad_kernel.h
paddle/phi/kernels/impl/addcmul_kernel_impl.h
paddle/phi/kernels/impl/addcmul_grad_kernel_impl.h
paddle/phi/kernels/cpu/addcmul_kernel.cc
paddle/phi/kernels/cpu/addcmul_grad_kernel.cc
paddle/phi/kernels/gpu/addcmul_kernel.cu
paddle/phi/kernels/gpu/addcmul_grad_kernel.cu
test/legacy_test/test_addcmul.py

Modified Files (9):

paddle/phi/ops/yaml/ops.yaml
paddle/phi/ops/yaml/backward.yaml
paddle/phi/infermeta/ternary.h
paddle/phi/infermeta/ternary.cc
paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/multiary_infer_sym.h
paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/multiary_infer_sym.cc
python/paddle/tensor/__init__.py
python/paddle/tensor/math.py
test/white_list/op_threshold_white_list.py

是否引起精度变化

…rnels for CPU/GPU (fp16, fp32, fp64, bf16) - Add operator configuration in ops.yaml and backward.yaml - Implement AddcmulInferMeta for shape inference - Add PIR symbolic shape inference support - Add Python API: paddle.addcmul() and Tensor.addcmul() - Add comprehensive test suite (52 tests, all passing) - Add to FP64 gradient threshold whitelist - Formula: output = input + value * tensor1 * tensor2 - Supports broadcasting and multiple dtypes.
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Jan 13, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Copy Markdown
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看下覆盖率,确保都能测到。

同时提前进行下PaConvert测试,确保与torch计算结果一致。截图下PaConvert的case计算结果。

return _C_ops.addmm_(input, x, y, beta, alpha)


def addcmul(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增API直接采取C++下沉的方法吧,这个可以不加

@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Copy link
Copy Markdown
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR看怎么减小下大小


add_doc_and_signature(
"i1",
"addcmul",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要删掉别的,改完后自己先check下所有改动是否符合预期

return _C_ops.addmm_(input, x, y, beta, alpha)


# def addcmul(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR压缩下行数,这些删除掉

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 34.70149% with 175 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@b79c618). Learn more about missing BASE report.

⚠️ Current head a3ebc16 differs from pull request most recent head 55f8f54

Please upload reports for the commit 55f8f54 to get more accurate results.

Files with missing lines Patch % Lines
paddle/phi/kernels/impl/addcmul_grad_kernel_impl.h 0.00% 138 Missing ⚠️
paddle/phi/kernels/impl/addcmul_kernel_impl.h 50.00% 30 Missing ⚠️
...terface/infer_symbolic_shape/multiary_infer_sym.cc 87.17% 5 Missing ⚠️
paddle/phi/kernels/funcs/common_shape.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #77333   +/-   ##
==========================================
  Coverage           ?   34.70%           
==========================================
  Files              ?        8           
  Lines              ?      268           
  Branches           ?        0           
==========================================
  Hits               ?       93           
  Misses             ?      175           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

zhwesky2010
zhwesky2010 previously approved these changes Feb 12, 2026
@zhwesky2010
Copy link
Copy Markdown
Contributor

示例代码还是没过,是不是要导入到 _init_.py里去

infoflow 2026-02-12 15-52-17

@SigureMo
Copy link
Copy Markdown
Member

示例代码还是没过,是不是要导入到 __init__.py 里去

对的,需要的,要在 paddle/__init__.py 加一下

zhwesky2010
zhwesky2010 previously approved these changes Feb 12, 2026
@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zhwesky2010 zhwesky2010 requested a review from zyfncg February 13, 2026 01:09
SigureMo
SigureMo previously approved these changes Feb 13, 2026
Copy link
Copy Markdown
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage无法通过的原因是test_addcmul无法运行通过,本地在GPU上测下addcmul吧。
同时尽可能减少skipif和atol/rtol的修改。


cinn_loss = net_cinn(x, t1, t2)
np.testing.assert_allclose(
cinn_loss.numpy(), dy_loss.numpy(), rtol=1e-5, atol=1e-5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这几个地方建议精简下case,避免atol/rtol的使用。


cinn_loss = net_cinn(x, t1, t2)
np.testing.assert_allclose(
cinn_loss.numpy(), dy_loss.numpy(), rtol=1e-5, atol=1e-5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

dy_out = fn(*inputs)

np.testing.assert_allclose(
cinn_out.numpy(), dy_out.numpy(), rtol=1e-5, atol=1e-5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cinn的计算结果与常规的应该是一样的,这里降低阈值的原因是

paddle.enable_static()


@unittest.skipIf(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果需要大量skipif,还是从CMakeLists.txt里面来控制吧。需尽量避免skipif的使用。

@Manfredss Manfredss dismissed stale reviews from SigureMo and zhwesky2010 via d530867 March 2, 2026 06:11
@luotao1 luotao1 changed the title [API Compatibility No.354] Implement paddle.addcmul API -part [API Compatibility] Implement paddle.addcmul API -part Mar 4, 2026
@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zhwesky2010
Copy link
Copy Markdown
Contributor

单测还是运行失败了,看起来是静态图测试InferSymbolicShape的case没跑过,你本地运行能跑过吗?

infoflow 2026-03-05 18-52-22

@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

main = paddle.static.Program()
startup = paddle.static.Program()
with base.program_guard(main, startup):
x = paddle.static.data(name="x", shape=self.shape, dtype=self.dtype)
Copy link
Copy Markdown
Contributor

@zhwesky2010 zhwesky2010 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看这个报错信息是 PIR下创建OP时,创建完一个DataOP,在Insert到block时越界错误。看起来和addcmul自身的OP逻辑无关,是不是单测本身不对。本地复现下问题调试看看。

Image Image

Copy link
Copy Markdown
Contributor Author

@Manfredss Manfredss Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要我这边用 python 测单测一直是通过的
image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你有没有Linux GPU的运行环境

@zhwesky2010
Copy link
Copy Markdown
Contributor

@Manfredss 这个PR的问题复现了吗

@zhwesky2010
Copy link
Copy Markdown
Contributor

@Manfredss 这个尽快调试吧

@Manfredss
Copy link
Copy Markdown
Contributor Author

Manfredss commented Apr 1, 2026

@Manfredss 这个PR的问题复现了吗

我 linux 下好像也没问题?
image

@Manfredss
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants