[Cherry-pick Fleety_12] Bigtensor and api precision#76023
Closed
zhengshengning wants to merge 24 commits intoPaddlePaddle:fleety_12from
Closed
[Cherry-pick Fleety_12] Bigtensor and api precision#76023zhengshengning wants to merge 24 commits intoPaddlePaddle:fleety_12from
zhengshengning wants to merge 24 commits intoPaddlePaddle:fleety_12from
Conversation
…en opotype is 'div'(PaddlePaddle#75237)
…nsor is floating (PaddlePaddle#75238) * align LinspaceKernel * update meta * update gpu kernel * fix LinspaceKernelInner * improve kernel
… *(1 + tan(x)^2) (PaddlePaddle#75335) * Tan reverse calculation: dx = dout *(1 + tan(x)^2)
…onal.grid_sample to align with torch accuracy. (PaddlePaddle#75355) * accuracy_stable_grid_sample * fix
* fix * fix test * fix
…ch precision. (PaddlePaddle#75503) * accuracy_stable_sin * accuracy_stable_cos
* fix * fix * fix * fix * fix
…ackward (PaddlePaddle#75525) * fix precision for float16 of paddle.tan backward * fix else branch of CudaTanGradFunctor
…dle#75549) * accuracy_stable_expm1 * fix
…ional.softplus to double (PaddlePaddle#75426) * fix beta and threshold of Softplus to double * fix test_softplus_activation_fuse_pass v1 * fix test_activation_zero * fix flaot of SoftplusDoubleGradKernel to double * add op_patches for softplus * add yaml for ops/yaml/legacy * fix infershape/operator for FLOAT64 * fix * add SoftPlusOpTranscriber * fix * fix * fix1 * fix2 * fix coverage * fix coverage2
* fix * fix * fix dcu
…addlePaddle#75799) * accuracy_stable_log * accuracy_stable_log * fix * fix * fix * fix * fix5
…ble (PaddlePaddle#75816) * accuracy_stable_logit * add LogitOpTranscriber * fix coverage * fix 0yaml
* accuracy_stable_log_sigmoid * fix test_activation_stride_op.py
…e paddle.nn.functional.leaky_relu API to double (PaddlePaddle#75547)
…le#75856) * fix funcs * gpu * fix * fix * 修改PADDLE_ENFORCE信息 * fix cpu error * fix dcu * fix dcu * fix
* feature: Add specialized LogSigmoidFunctor and CudaLogSigmoidFunctor for complex numbers This commit introduces specialized implementations of LogSigmoidFunctor and CudaLogSigmoidFunctor to handle complex number inputs. The new implementations utilize direct formulas for improved accuracy and stability in calculations involving complex types. * refactor: Optimize LogSigmoidFunctor and CudaLogSigmoidFunctor for complex types by caching exp(-x) to reduce redundant computations. This change enhances performance while maintaining accuracy in calculations. * refactor: modified the formula in LogSigmoidFunctor to make it numerical stable
|
你的PR提交成功,感谢你对开源项目的贡献! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
New features
Description
将下面paddle develop的如下PR cherry-pick到 fleety_12:
大Tensor:
#75856
#75523
#75383
精度逐位对齐:
#75717
#75379
#75588
#75605
#75799
#75341
#75503
#75355
#75363
#75426
#75454
#75367
#75335
#75525
#75549
#75816
#75547
#74638
#75237
#75238
#75965
#75898
有一些fluid、pir、onednn、pass、组合算子、自动并行的修改是由于:为了不损失attribute精度,与Torch对齐,将kernel签名从float改为了double。但这些Kernel从最终Operator算子库的Maker开始就是float类型的,这导致需要很多兼容问题,如:
此外,部分算子运算逻辑调整后,组合算子也需要配合调整组合逻辑,引发了一些组合算子修改。