Skip to content

Let clang on Linux and clang-cl on Windows "see" more optimizing macros #131033

Open
@chris-eibl

Description

@chris-eibl

Feature or enhancement

Proposal:

Here are a few optimizing macros, some of which clang under Linux does not "see", because

None of these are seen by clang-cl on Windows, because there

  • clang-cl does not set __GNUC__ (most probably because too much code out there would then assume "ah - I am on Linux")
  • but clang-cl does set __clang__

IMHO, "syncing" them between GCC/clang on Linux and clang-cl on Windows is preferable.

Neither seen on Linux nor on Windows: #130891 would fix:

cpython/Include/pyport.h

Lines 323 to 325 in 98fa4a4

#if defined(__GNUC__) \
&& ((__GNUC__ >= 5) || (__GNUC__ == 4) && (__GNUC_MINOR__ >= 3))
#define _Py_HOT_FUNCTION __attribute__((hot))

Seen on Linux, not seen on Windows: #131019 would fix:

cpython/Objects/obmalloc.c

Lines 1460 to 1462 in 98fa4a4

#if defined(__GNUC__) && (__GNUC__ > 2) && defined(__OPTIMIZE__)
# define UNLIKELY(value) __builtin_expect((value), 0)
# define LIKELY(value) __builtin_expect((value), 1)

Seen on Linux, not seen on Windows:

#if defined(__GNUC__) \
&& (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96))
# define XML_ATTR_MALLOC __attribute__((__malloc__))

Neither seen on Linux nor on Windows:

#if defined(__GNUC__) \
&& ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
# define XML_ATTR_ALLOC_SIZE(x) __attribute__((__alloc_size__(x)))

The last two are in vendored code, but I've temporarily modified it (01183d7) and then reverted again (1c4a55d)

Enabling them all for clang-cl on Windows is performance neutral wrt to the pyperformance benchmark.

Benchmark clang.release.19.1.1.92e5f826ac clang.release.19.1.1.16a7f4607e.pyHot
Geometric mean (ref) 1.01x faster
Benchmark clang.pgo.19.1.1.92e5f826ac clang.pgo.19.1.1.16a7f4607e.pyHot
Geometric mean (ref) 1.01x slower
Benchmark clang.release.20.1.0-rc2.92e5f826ac clang.release.20.1.0-rc2.16a7f4607e.pyHot
Geometric mean (ref) 1.01x faster
Benchmark clang.pgo.20.1.0-rc2.92e5f826ac clang.pgo.20.1.0-rc2.16a7f4607e.pyHot
Geometric mean (ref) 1.00x slower
Details

Benchmark clang.release.19.1.1.92e5f826ac clang.release.19.1.1.16a7f4607e.pyHot
telco 11.9 ms 10.9 ms: 1.10x faster
xml_etree_parse 236 ms 217 ms: 1.09x faster
logging_format 15.8 us 15.0 us: 1.05x faster
async_tree_eager 145 ms 138 ms: 1.05x faster
async_tree_none_tg 375 ms 358 ms: 1.05x faster
unpickle_list 5.83 us 5.57 us: 1.05x faster
xml_etree_iterparse 157 ms 150 ms: 1.05x faster
unpickle 23.0 us 22.0 us: 1.05x faster
async_tree_memoization_tg 460 ms 442 ms: 1.04x faster
xml_etree_generate 142 ms 137 ms: 1.04x faster
nqueens 125 ms 120 ms: 1.04x faster
async_tree_memoization 490 ms 472 ms: 1.04x faster
async_tree_io 876 ms 844 ms: 1.04x faster
logging_simple 14.3 us 13.8 us: 1.04x faster
deepcopy_reduce 3.92 us 3.78 us: 1.04x faster
crypto_pyaes 104 ms 101 ms: 1.04x faster
pprint_pformat 2.18 sec 2.10 sec: 1.04x faster
async_tree_none 391 ms 378 ms: 1.03x faster
pprint_safe_repr 1.06 sec 1.02 sec: 1.03x faster
async_tree_eager_memoization 290 ms 281 ms: 1.03x faster
json_dumps 16.5 ms 16.0 ms: 1.03x faster
fannkuch 580 ms 562 ms: 1.03x faster
scimark_sparse_mat_mult 5.76 ms 5.59 ms: 1.03x faster
async_tree_eager_io 822 ms 798 ms: 1.03x faster
xml_etree_process 96.4 ms 93.7 ms: 1.03x faster
async_tree_eager_tg 307 ms 299 ms: 1.03x faster
scimark_fft 481 ms 470 ms: 1.03x faster
coroutines 31.9 ms 31.2 ms: 1.02x faster
async_tree_io_tg 853 ms 834 ms: 1.02x faster
pathlib 255 ms 250 ms: 1.02x faster
typing_runtime_protocols 224 us 220 us: 1.02x faster
django_template 53.7 ms 52.8 ms: 1.02x faster
sympy_expand 650 ms 640 ms: 1.02x faster
unpickle_pure_python 305 us 300 us: 1.02x faster
async_tree_eager_memoization_tg 411 ms 405 ms: 1.02x faster
async_tree_cpu_io_mixed_tg 752 ms 741 ms: 1.02x faster
chaos 88.2 ms 86.9 ms: 1.02x faster
sqlite_synth 3.57 us 3.52 us: 1.01x faster
tomli_loads 2.70 sec 2.66 sec: 1.01x faster
pickle_pure_python 444 us 438 us: 1.01x faster
sqlglot_normalize 150 ms 148 ms: 1.01x faster
regex_compile 171 ms 169 ms: 1.01x faster
mako 17.3 ms 17.1 ms: 1.01x faster
sqlglot_parse 1.67 ms 1.65 ms: 1.01x faster
sympy_sum 211 ms 208 ms: 1.01x faster
hexiom 8.26 ms 8.19 ms: 1.01x faster
sqlglot_transpile 2.06 ms 2.04 ms: 1.01x faster
sqlglot_optimize 74.0 ms 73.4 ms: 1.01x faster
python_startup 43.2 ms 42.9 ms: 1.01x faster
async_generators 540 ms 536 ms: 1.01x faster
gc_traversal 4.82 ms 4.79 ms: 1.01x faster
comprehensions 23.2 us 23.1 us: 1.01x faster
generators 38.1 ms 37.8 ms: 1.01x faster
richards_super 73.9 ms 73.4 ms: 1.01x faster
deepcopy 376 us 373 us: 1.01x faster
genshi_text 29.8 ms 29.6 ms: 1.01x faster
pickle_dict 32.3 us 32.2 us: 1.00x faster
scimark_sor 168 ms 169 ms: 1.01x slower
go 145 ms 146 ms: 1.01x slower
pyflate 596 ms 602 ms: 1.01x slower
logging_silent 133 ns 135 ns: 1.01x slower
dulwich_log 130 ms 132 ms: 1.01x slower
regex_v8 35.2 ms 35.6 ms: 1.01x slower
spectral_norm 128 ms 130 ms: 1.02x slower
docutils 3.60 sec 3.66 sec: 1.02x slower
sympy_integrate 26.4 ms 26.8 ms: 1.02x slower
scimark_monte_carlo 90.7 ms 92.6 ms: 1.02x slower
float 102 ms 105 ms: 1.02x slower
2to3 429 ms 439 ms: 1.02x slower
nbody 151 ms 155 ms: 1.03x slower
genshi_xml 71.7 ms 74.2 ms: 1.04x slower
Geometric mean (ref) 1.01x faster
Benchmark clang.pgo.19.1.1.92e5f826ac clang.pgo.19.1.1.16a7f4607e.pyHot
2to3 465 ms 380 ms: 1.22x faster
async_generators 506 ms 490 ms: 1.03x faster
coroutines 27.1 ms 26.4 ms: 1.03x faster
pidigits 233 ms 228 ms: 1.02x faster
pickle_dict 27.8 us 27.3 us: 1.02x faster
sympy_sum 187 ms 184 ms: 1.02x faster
typing_runtime_protocols 186 us 183 us: 1.02x faster
raytrace 309 ms 305 ms: 1.02x faster
unpickle 16.6 us 16.4 us: 1.01x faster
genshi_xml 60.4 ms 59.5 ms: 1.01x faster
regex_compile 151 ms 149 ms: 1.01x faster
scimark_sparse_mat_mult 4.82 ms 4.77 ms: 1.01x faster
unpack_sequence 55.7 ns 55.1 ns: 1.01x faster
sqlglot_parse 1.42 ms 1.41 ms: 1.01x faster
sqlglot_transpile 1.75 ms 1.73 ms: 1.01x faster
telco 9.01 ms 8.91 ms: 1.01x faster
logging_format 12.9 us 12.8 us: 1.01x faster
unpickle_list 5.04 us 4.99 us: 1.01x faster
nqueens 95.3 ms 94.4 ms: 1.01x faster
async_tree_eager_io 720 ms 714 ms: 1.01x faster
sympy_expand 556 ms 551 ms: 1.01x faster
scimark_lu 124 ms 123 ms: 1.01x faster
docutils 3.09 sec 3.07 sec: 1.01x faster
chaos 69.1 ms 68.7 ms: 1.01x faster
sqlglot_optimize 63.5 ms 63.1 ms: 1.01x faster
sympy_integrate 22.9 ms 22.7 ms: 1.01x faster
spectral_norm 106 ms 105 ms: 1.00x faster
scimark_fft 352 ms 351 ms: 1.00x faster
deepcopy 298 us 300 us: 1.00x slower
generators 34.0 ms 34.2 ms: 1.01x slower
meteor_contest 119 ms 119 ms: 1.01x slower
logging_silent 106 ns 106 ns: 1.01x slower
tomli_loads 2.21 sec 2.22 sec: 1.01x slower
pickle_pure_python 367 us 369 us: 1.01x slower
regex_effbot 3.21 ms 3.24 ms: 1.01x slower
pyflate 514 ms 518 ms: 1.01x slower
sqlite_synth 3.41 us 3.44 us: 1.01x slower
deltablue 3.66 ms 3.69 ms: 1.01x slower
unpickle_pure_python 247 us 249 us: 1.01x slower
nbody 126 ms 128 ms: 1.01x slower
scimark_sor 140 ms 141 ms: 1.01x slower
mdp 3.13 sec 3.16 sec: 1.01x slower
pprint_safe_repr 891 ms 899 ms: 1.01x slower
go 126 ms 127 ms: 1.01x slower
richards_super 52.0 ms 52.6 ms: 1.01x slower
async_tree_eager 116 ms 117 ms: 1.01x slower
regex_dna 204 ms 207 ms: 1.01x slower
create_gc_cycles 1.49 ms 1.51 ms: 1.01x slower
richards 45.4 ms 46.0 ms: 1.01x slower
deepcopy_memo 33.4 us 34.1 us: 1.02x slower
async_tree_eager_tg 267 ms 273 ms: 1.02x slower
json_loads 31.2 us 31.9 us: 1.02x slower
pprint_pformat 1.79 sec 1.85 sec: 1.03x slower
gc_traversal 5.03 ms 5.28 ms: 1.05x slower
xml_etree_parse 208 ms 220 ms: 1.06x slower
async_tree_io 759 ms 832 ms: 1.10x slower
asyncio_tcp 1.38 sec 1.52 sec: 1.10x slower
xml_etree_process 78.5 ms 87.4 ms: 1.11x slower
xml_etree_generate 114 ms 128 ms: 1.11x slower
async_tree_memoization_tg 392 ms 449 ms: 1.15x slower
async_tree_io_tg 746 ms 855 ms: 1.15x slower
async_tree_memoization 414 ms 477 ms: 1.15x slower
async_tree_none_tg 325 ms 382 ms: 1.17x slower
xml_etree_iterparse 141 ms 172 ms: 1.22x slower
Geometric mean (ref) 1.01x slower
Benchmark clang.release.20.1.0-rc2.92e5f826ac clang.release.20.1.0-rc2.16a7f4607e.pyHot
spectral_norm 139 ms 124 ms: 1.13x faster
pickle_list 5.89 us 5.46 us: 1.08x faster
sqlite_synth 3.71 us 3.51 us: 1.06x faster
pickle_dict 32.3 us 30.7 us: 1.05x faster
unpickle 20.8 us 20.0 us: 1.04x faster
json_loads 43.0 us 41.3 us: 1.04x faster
unpickle_list 5.35 us 5.15 us: 1.04x faster
mako 16.9 ms 16.3 ms: 1.04x faster
pprint_safe_repr 1.01 sec 976 ms: 1.03x faster
crypto_pyaes 102 ms 98.7 ms: 1.03x faster
coverage 111 ms 108 ms: 1.03x faster
coroutines 30.3 ms 29.5 ms: 1.03x faster
telco 10.4 ms 10.2 ms: 1.03x faster
json_dumps 15.6 ms 15.3 ms: 1.02x faster
asyncio_websockets 547 ms 534 ms: 1.02x faster
scimark_sparse_mat_mult 5.94 ms 5.81 ms: 1.02x faster
pprint_pformat 2.07 sec 2.02 sec: 1.02x faster
unpickle_pure_python 300 us 294 us: 1.02x faster
xml_etree_parse 218 ms 214 ms: 1.02x faster
xml_etree_generate 135 ms 133 ms: 1.02x faster
async_generators 510 ms 501 ms: 1.02x faster
typing_runtime_protocols 217 us 213 us: 1.02x faster
mdp 3.72 sec 3.67 sec: 1.02x faster
scimark_fft 437 ms 431 ms: 1.01x faster
bench_thread_pool 1.79 ms 1.77 ms: 1.01x faster
deepcopy_reduce 3.71 us 3.66 us: 1.01x faster
docutils 3.56 sec 3.52 sec: 1.01x faster
async_tree_memoization_tg 433 ms 428 ms: 1.01x faster
sqlglot_transpile 2.02 ms 2.00 ms: 1.01x faster
genshi_xml 69.7 ms 69.0 ms: 1.01x faster
xml_etree_process 92.2 ms 91.3 ms: 1.01x faster
fannkuch 539 ms 535 ms: 1.01x faster
sqlglot_normalize 144 ms 143 ms: 1.01x faster
float 102 ms 101 ms: 1.01x faster
raytrace 361 ms 358 ms: 1.01x faster
sqlglot_parse 1.64 ms 1.63 ms: 1.01x faster
gc_traversal 4.84 ms 4.80 ms: 1.01x faster
nqueens 117 ms 116 ms: 1.01x faster
meteor_contest 123 ms 123 ms: 1.01x faster
sqlglot_optimize 71.4 ms 71.1 ms: 1.00x faster
comprehensions 23.0 us 22.9 us: 1.00x faster
pidigits 240 ms 240 ms: 1.00x faster
unpack_sequence 55.0 ns 55.2 ns: 1.00x slower
chaos 84.3 ms 84.7 ms: 1.00x slower
dulwich_log 126 ms 126 ms: 1.00x slower
regex_compile 165 ms 166 ms: 1.00x slower
hexiom 8.01 ms 8.07 ms: 1.01x slower
async_tree_cpu_io_mixed_tg 708 ms 714 ms: 1.01x slower
async_tree_eager 134 ms 135 ms: 1.01x slower
richards_super 73.2 ms 73.9 ms: 1.01x slower
deltablue 4.31 ms 4.35 ms: 1.01x slower
asyncio_tcp_ssl 3.59 sec 3.64 sec: 1.01x slower
2to3 418 ms 423 ms: 1.01x slower
scimark_sor 159 ms 162 ms: 1.02x slower
python_startup 40.9 ms 41.7 ms: 1.02x slower
scimark_lu 143 ms 146 ms: 1.02x slower
async_tree_eager_cpu_io_mixed 551 ms 565 ms: 1.03x slower
go 146 ms 150 ms: 1.03x slower
generators 38.2 ms 39.7 ms: 1.04x slower
nbody 136 ms 142 ms: 1.05x slower
Geometric mean (ref) 1.01x faster
Benchmark clang.pgo.20.1.0-rc2.92e5f826ac clang.pgo.20.1.0-rc2.16a7f4607e.pyHot
pickle_pure_python 383 us 364 us: 1.05x faster
pprint_safe_repr 863 ms 840 ms: 1.03x faster
regex_effbot 3.20 ms 3.13 ms: 1.02x faster
pickle_list 4.77 us 4.66 us: 1.02x faster
typing_runtime_protocols 178 us 174 us: 1.02x faster
pprint_pformat 1.78 sec 1.74 sec: 1.02x faster
xml_etree_generate 110 ms 108 ms: 1.02x faster
richards 45.1 ms 44.3 ms: 1.02x faster
scimark_sor 138 ms 136 ms: 1.01x faster
gc_traversal 5.21 ms 5.15 ms: 1.01x faster
xml_etree_process 76.3 ms 75.5 ms: 1.01x faster
async_tree_eager 113 ms 111 ms: 1.01x faster
xml_etree_parse 202 ms 201 ms: 1.01x faster
nqueens 92.3 ms 91.4 ms: 1.01x faster
coroutines 24.9 ms 24.7 ms: 1.01x faster
mako 13.4 ms 13.3 ms: 1.01x faster
meteor_contest 118 ms 118 ms: 1.00x faster
unpickle_pure_python 247 us 246 us: 1.00x faster
sqlglot_normalize 120 ms 119 ms: 1.00x faster
deltablue 3.69 ms 3.71 ms: 1.00x slower
sympy_integrate 22.6 ms 22.7 ms: 1.00x slower
deepcopy 289 us 291 us: 1.01x slower
sympy_sum 181 ms 182 ms: 1.01x slower
unpack_sequence 55.1 ns 55.4 ns: 1.01x slower
2to3 370 ms 373 ms: 1.01x slower
asyncio_tcp_ssl 3.52 sec 3.55 sec: 1.01x slower
sqlite_synth 3.20 us 3.22 us: 1.01x slower
async_tree_eager_io 701 ms 707 ms: 1.01x slower
sqlglot_parse 1.38 ms 1.40 ms: 1.01x slower
pidigits 228 ms 230 ms: 1.01x slower
async_tree_io_tg 727 ms 735 ms: 1.01x slower
dulwich_log 115 ms 117 ms: 1.01x slower
python_startup 39.4 ms 39.9 ms: 1.01x slower
chaos 67.0 ms 67.9 ms: 1.01x slower
raytrace 299 ms 303 ms: 1.01x slower
nbody 119 ms 120 ms: 1.01x slower
async_tree_eager_tg 260 ms 264 ms: 1.01x slower
scimark_lu 122 ms 124 ms: 1.02x slower
python_startup_no_site 34.0 ms 34.5 ms: 1.02x slower
regex_dna 204 ms 208 ms: 1.02x slower
crypto_pyaes 81.1 ms 82.7 ms: 1.02x slower
scimark_fft 341 ms 349 ms: 1.02x slower
scimark_sparse_mat_mult 4.53 ms 4.65 ms: 1.03x slower
sympy_str 320 ms 329 ms: 1.03x slower
bench_thread_pool 1.63 ms 1.68 ms: 1.03x slower
deepcopy_reduce 2.96 us 3.06 us: 1.03x slower
tomli_loads 2.20 sec 2.28 sec: 1.03x slower
pathlib 232 ms 241 ms: 1.04x slower
telco 8.45 ms 8.77 ms: 1.04x slower
unpickle 15.6 us 16.2 us: 1.04x slower
pickle 13.5 us 14.3 us: 1.05x slower
async_tree_memoization 405 ms 428 ms: 1.06x slower
async_tree_io 740 ms 784 ms: 1.06x slower
Geometric mean (ref) 1.00x slower

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildThe build process and cross-buildperformancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions