Skip to content

Windows TLS Emulation is extremely slow #61

Open
@mratsim

Description

@mratsim

Overhead-bound benchmarks like Fibonacci and Depth-First Search are significantly slower on Windows than Linux and Mac.

Config: i9-9980XE 18 cores, 36 threads, with 4.1GHz all core Turbo

On Fibonacci in particular, the default eager futures takes 14s under windows while it takes 370ms under Linux for a whopping 30x slowdown.
Lazy futures allocated via alloca takes 800ms while they take 180ms under Linux.

This points to a memory allocator issue.

Memory-bound benchmarks (transpose) and CPU-bound benchmarks (Black-Scholes) seem to behave somewhat similarly to Linux.

Similar issues:

Low priority as we can't probably do anything more than what we have now in our memory subsytem. It's doubtful than even using Mimalloc on Windows (just for Weave) would help as our memory pool is based on the same techniques. Lastly Fibonacci is an extreme case with computation load of 1 cycle while Weave targets being efficient at 2000 cycles.

TODO: benchmark Cilk and TBB to make sure we are not missing something.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions