Windows TLS Emulation is extremely slow

Overhead-bound benchmarks like Fibonacci and Depth-First Search are significantly slower on Windows than Linux and Mac.

Config: i9-9980XE 18 cores, 36 threads, with 4.1GHz all core Turbo

On Fibonacci in particular, the default eager futures takes **14s** under windows while it takes 370ms under Linux for a whopping 30x slowdown.
Lazy futures allocated via `alloca` takes 800ms while they take 180ms under Linux.

This points to a memory allocator issue.

Memory-bound benchmarks (transpose) and CPU-bound benchmarks (Black-Scholes) seem to behave somewhat similarly to Linux.

Similar issues:
- https://www.reddit.com/r/cpp/comments/blp3sf/performance_difference_on_osx_and_windows_10/
- https://www.perlmonks.org/?node_id=810276

Low priority as we can't probably do anything more than what we have now in our memory subsytem. It's doubtful than even using Mimalloc on Windows (just for Weave) would help as our memory pool is based on the same techniques. Lastly Fibonacci is an extreme case with computation load of 1 cycle while Weave targets being efficient at 2000 cycles.

TODO: benchmark Cilk and TBB to make sure we are not missing something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows TLS Emulation is extremely slow #61

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Windows TLS Emulation is extremely slow #61

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions