Open
Description
Important benchmarks
- Framework overhead via fibonacci
- Unbalanced Tree Search
- GEMM / Matrix Multiply vs OpenBLAS and MKL
- Binary size overhead when runtime is not compiled in
- Space overhead at runtime vs serial code
- Returning memory to the OS on long-running processes
- PARSEC benchmark suite: https://parsec.cs.princeton.edu/
- NAS Parallel Benchmarks from the NASA Advanced Supercomputing: https://www.nas.nasa.gov/publications/npb.html
Instrumentation, tutorials, examples
- topology: hyperthreading siblings, NUMA
- measuring performance, core usage, latencies, cache misses, view assembly:
perf
- Intel VTune
- Apple Instruments
- bloaty for binary size
-
perf c2c
for measuring cache contention / false sharing - helgrind for locking
Requires changing the internals:
Stretch goals
- Other common benchmarks (nqueens, nbodies, LU, heat, qsort, bouncing producer-consumer, ...)
- Porting michi (550 lines go bot with parallel Monte-Carlo Tree Search in Python) to Nim (https://github.com/pasky/michi) and benching against the C and Go implementations.