A collection of performance studies for Python ML workloads, techniques, and tools.
- python-optimization-flags — Impact of
python -O/-OOon ML-style code - einsum-perf —
einsumvs native vsopt_einsumacross JAX and PyTorch, CPU and GPU - cuda-mps — aggregate GPU throughput with and without CUDA MPS for N concurrent processes