Description
Overhead-bound benchmarks like Fibonacci and Depth-First Search are significantly slower on Windows than Linux and Mac.
Config: i9-9980XE 18 cores, 36 threads, with 4.1GHz all core Turbo
On Fibonacci in particular, the default eager futures takes 14s under windows while it takes 370ms under Linux for a whopping 30x slowdown.
Lazy futures allocated via alloca
takes 800ms while they take 180ms under Linux.
This points to a memory allocator issue.
Memory-bound benchmarks (transpose) and CPU-bound benchmarks (Black-Scholes) seem to behave somewhat similarly to Linux.
Similar issues:
- https://www.reddit.com/r/cpp/comments/blp3sf/performance_difference_on_osx_and_windows_10/
- https://www.perlmonks.org/?node_id=810276
Low priority as we can't probably do anything more than what we have now in our memory subsytem. It's doubtful than even using Mimalloc on Windows (just for Weave) would help as our memory pool is based on the same techniques. Lastly Fibonacci is an extreme case with computation load of 1 cycle while Weave targets being efficient at 2000 cycles.
TODO: benchmark Cilk and TBB to make sure we are not missing something.