Add a benchmark/example for numexpr usage under free-threading conditions #508

andfoy · 2025-03-27T17:05:32Z

Fixes #503

…ions

andfoy · 2025-03-27T17:07:07Z

numexpr/necompiler.py

@@ -1009,5 +1005,5 @@ def re_evaluate(local_dict: Optional[Dict] = None,
    argnames = _numexpr_last.l['argnames']
    args = getArguments(argnames, local_dict, global_dict, _frame_depth=_frame_depth)
    kwargs = _numexpr_last.l['kwargs']
-    with evaluate_lock:
-        return compiled_ex(*args, **kwargs)
+    # with evaluate_lock:


@FrancescAlted, it seems that this lock is a performance bottleneck when multiple threads are ran in parallel, is it necessary?

With the lock:

Benchmarking Expression 1: NumPy time (threaded over 32 chunks with 16 threads): 3.276298 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 9.975059 seconds numexpr speedup: 0.33x ---------------------------------------- Benchmarking Expression 2: NumPy time (threaded over 32 chunks with 16 threads): 18.981946 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 50.327974 seconds numexpr speedup: 0.38x ---------------------------------------- Benchmarking Expression 3: NumPy time (threaded over 32 chunks with 16 threads): 20.414158 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 70.900648 seconds numexpr speedup: 0.29x ---------------------------------------- Benchmarking Expression 4: NumPy time (threaded over 32 chunks with 16 threads): 38.012808 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 160.476216 seconds numexpr speedup: 0.24x ----------------------------------------

Without locking:

Benchmarking Expression 1: NumPy time (threaded over 32 chunks with 16 threads): 3.415349 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 2.618876 seconds numexpr speedup: 1.30x ---------------------------------------- Benchmarking Expression 2: NumPy time (threaded over 32 chunks with 16 threads): 19.005238 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 12.611407 seconds numexpr speedup: 1.51x ---------------------------------------- Benchmarking Expression 3: NumPy time (threaded over 32 chunks with 16 threads): 20.555149 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 17.690749 seconds numexpr speedup: 1.16x ---------------------------------------- Benchmarking Expression 4: NumPy time (threaded over 32 chunks with 16 threads): 38.338372 seconds numexpr time (threaded with re_evaluate over 32 chunks with 16 threads): 35.074684 seconds numexpr speedup: 1.09x ----------------------------------------

Add a benchmark/example for numexpr usage under free-threading condit…

b8634be

…ions

andfoy commented Mar 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a benchmark/example for numexpr usage under free-threading conditions #508

Add a benchmark/example for numexpr usage under free-threading conditions #508

andfoy commented Mar 27, 2025

andfoy Mar 27, 2025

andfoy Mar 27, 2025

andfoy Mar 27, 2025

Add a benchmark/example for numexpr usage under free-threading conditions #508

Are you sure you want to change the base?

Add a benchmark/example for numexpr usage under free-threading conditions #508

Conversation

andfoy commented Mar 27, 2025

andfoy Mar 27, 2025

Choose a reason for hiding this comment

andfoy Mar 27, 2025

Choose a reason for hiding this comment

andfoy Mar 27, 2025

Choose a reason for hiding this comment