Disable NVIDIA_TF32_OVERRIDE by default for better precision.#75476
Merged
wanghuancoder merged 1 commit intoPaddlePaddle:developfrom Oct 13, 2025
Merged
Disable NVIDIA_TF32_OVERRIDE by default for better precision.#75476wanghuancoder merged 1 commit intoPaddlePaddle:developfrom
wanghuancoder merged 1 commit intoPaddlePaddle:developfrom
Conversation
sneaxiy
approved these changes
Sep 23, 2025
Contributor
Author
|
/re-run all-failed |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #75476 +/- ##
===========================================
Coverage ? 100.00%
===========================================
Files ? 1
Lines ? 2
Branches ? 0
===========================================
Hits ? 2
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
Author
|
/re-run all-failed |
2 similar comments
Contributor
Author
|
/re-run all-failed |
Contributor
Author
|
/re-run all-failed |
phlrain
approved these changes
Oct 13, 2025
SigureMo
pushed a commit
to cattidea/Paddle
that referenced
this pull request
Oct 14, 2025
A-nnonymous
added a commit
to A-nnonymous/Paddle
that referenced
this pull request
Oct 17, 2025
…Paddle#75476)" This reverts commit fcf3c3f.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
默认不开启cublas中的tf32 策略。
修改前flag NVIDIA_TF32_OVERRIDE在CUBLAS等库里默认为1,会引入tf32来加速fp32,牺牲精度。
现与torch行为对齐,参考链接如下:
监控结果:

避免FP32 gemm或不带bias的linear 中,13位尾数裁剪带来的大幅精度损失。后续需要进行更细粒度的类型管控。
pcard-93348