9 lines (6 loc) · 476 Bytes

mlps — ML Performance Studies

A collection of performance studies for Python ML workloads, techniques, and tools.

Studies

python-optimization-flags — Impact of python -O / -OO on ML-style code
einsum-perf — einsum vs native vs opt_einsum across JAX and PyTorch, CPU and GPU
cuda-mps — aggregate GPU throughput with and without CUDA MPS for N concurrent processes