runtime-benchmarks

Benchmarks to compare the performance of async runtimes / executors.

An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/

Results summary table of a single configuration:

Runtime	libfork	TooManyCooks	tbb	cppcoro	taskflow	coros	concurrencpp
Mean Ratio to Best (lower is better)	1.00x	1.21x	2.78x	3.69x	4.91x	5.26x	170.29x
skynet(8)	39243 us	48183 us	142476 us	277238 us	310068 us	150896 us	11879067 us
nqueens(14)	85499 us	97645 us	158381 us	216374 us	323129 us	1024948 us	8247812 us
fib(39)	66483 us	94078 us	271683 us	243936 us	428538 us	266954 us	18636205 us
matmul(2048)	41195 us	42898 us	64231 us	61755 us	62715 us	49846 us	68094 us

Click to view the machine configuration used in the summary table

Processor: EPYC 7742 64-core processor
Worker Thread Count: 64 (no SMT)
OS: Debian 13 Server
Compiler: Clang 19.1.7 Release (-O3 -march=native)
CPU boost enabled / schedutil governor
Linked against libtcmalloc_minimal.so.4

What's covered?

Currently only includes C++ frameworks, and several recursive fork-join benchmarks:

recursive fibonacci (forks x2)
skynet (original link) but increased to 100M tasks (forks x10)
nqueens (forks up to x14)
matmul (forks x4)

Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.

How to build and run the benchmarks yourself

Install Dependencies:

The build+bench script uses python3
CMake + Clang 18 or newer
libfork and TooManyCooks depend on the hwloc library.
TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.

apt-get install cmake libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4

Get Quick Results (uses threads = #CPUs):

python3 ./build_and_bench_all.py

Results will appear in RESULTS.md and RESULTS.csv files.

Get Full Results (sweeps threads from 1 to #CPUs):

python3 ./build_and_bench_all.py full

Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site.

Future Plans

Frameworks to come:

(C#) .Net thread pool
(Rust) tokio
(Golang) goroutines
Facebook Folly
PhotonLibOS https://github.com/alibaba/PhotonLibOS

Benchmarks to come:

Lots of good inspiration here

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
cpp		cpp
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build_and_bench_all.py		build_and_bench_all.py
clean_all.sh		clean_all.sh
get_nproc.sh		get_nproc.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

runtime-benchmarks

What's covered?

How to build and run the benchmarks yourself

Install Dependencies:

Get Quick Results (uses threads = #CPUs):

Get Full Results (sweeps threads from 1 to #CPUs):

Future Plans

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tzcnt/runtime-benchmarks

Folders and files

Latest commit

History

Repository files navigation

runtime-benchmarks

What's covered?

How to build and run the benchmarks yourself

Install Dependencies:

Get Quick Results (uses threads = #CPUs):

Get Full Results (sweeps threads from 1 to #CPUs):

Future Plans

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages