Skip to content

Commit d4e5151

Browse files
chunyuan-wpytorchmergebot
authored andcommitted
Inductor cpp wrapper: add -ffast-math in linking flag (pytorch#104332)
Fix cpp wrapper CPU performance gap on `swsl_resnext101_32x16d` compared with the default python wrapper. The pre-trained weights of `swsl_resnext101_32x16d` contains denormal numbers (close to 0.0). Linking with `-ffast-math` will make the CPU flush denormals. For the default python wrapper, the compilation and linking are done in one command thus `-ffast-math` will take effect in both compilation and linking. CPP wrapper leverages cpp_extension which will do the compilation and linking in two stages, thus we need to explicitly add `-ffast-math` as a linking flag. Single thread single batch on ICX: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link=blue vlink=purple>   | time (s) default python wrapper | time (s) cpp wrapper before fix | time (s) cpp wrapper after fix -- | -- | -- | -- swsl_resnext101_32x16d | 0.459097836 | 13.82326214 | 0.448116195 </body> </html> Pull Request resolved: pytorch#104332 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/EikanWang
1 parent 732067e commit d4e5151

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

torch/_inductor/codecache.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -925,7 +925,12 @@ def load(cls, source_code, func_name, key, cuda):
925925
_use_custom_generated_macros = use_custom_generated_macros()
926926

927927
extra_cflags = f"{_cpp_flags} {_opt_flags} {_warning_all_flag} {_macros} {_use_custom_generated_macros}"
928-
extra_ldflags = f"{_shared} {_lpaths} {_libs}"
928+
# For CPP wrapper, add -ffast-math during linking to make CPU flush denormals.
929+
# CPP wrapper leverages cpp_extension which will do the compilation and linking in two stages.
930+
# We need to explicitly add -ffast-math as a linking flag.
931+
# For the default python wrapper, the compilation and linking are done in one command thus -ffast-math
932+
# will take effect in both compilation and linking.
933+
extra_ldflags = f"{_shared} {_lpaths} {_libs} -ffast-math"
929934
extra_include_paths = f"{_ipaths}"
930935

931936
mod = torch.utils.cpp_extension.load_inline(

0 commit comments

Comments
 (0)