Skip to content

Mysterious 2x perf regression on GEMM #40

Open
@mratsim

Description

@mratsim

With no code or hardware change at all, after month there is a 2x perf regression, OpenBLAS also is a bit slower (with no package update):

A matrix shape: (M: 1920, N: 1920)
B matrix shape: (M: 1920, N: 1920)
Output shape: (M: 1920, N: 1920)
Required number of operations: 14155.776 millions
Required bytes:                   29.491 MB
Arithmetic intensity:            480.000 FLOP/byte
Theoretical peak single-core:    224.000 GFLOP/s
Theoretical peak multi:         4032.000 GFLOP/s
Make sure to not bench Apple Accelerate or the default Linux BLAS.

OpenBLAS benchmark
Collected 10 samples in 0.101 seconds
Average time: 9.440 ms
Stddev  time: 0.141 ms
Min     time: 9.315 ms
Max     time: 9.733 ms
Perf:         1499.508 GFLOP/s

Laser production implementation
Collected 10 samples in 0.146 seconds
Average time: 14.000 ms
Stddev  time: 25.706 ms
Min     time: 5.839 ms
Max     time: 87.161 ms
Perf:         1011.102 GFLOP/s

PyTorch Glow: libjit matmul implementation (with AVX+FMA)
Collected 10 samples in 2.041 seconds
Average time: 204.123 ms
Stddev  time: 0.763 ms
Min     time: 203.362 ms
Max     time: 205.862 ms
Perf:         69.349 GFLOP/s

MKL-DNN reference GEMM benchmark
Collected 10 samples in 0.351 seconds
Average time: 34.305 ms
Stddev  time: 5.588 ms
Min     time: 30.013 ms
Max     time: 49.684 ms
Perf:         412.645 GFLOP/s

MKL-DNN JIT AVX benchmark
Collected 10 samples in 0.130 seconds
Average time: 11.230 ms
Stddev  time: 8.353 ms
Min     time: 7.725 ms
Max     time: 34.426 ms
Perf:         1260.573 GFLOP/s

MKL-DNN JIT AVX512 benchmark
Collected 10 samples in 0.083 seconds
Average time: 7.716 ms
Stddev  time: 7.932 ms
Min     time: 4.601 ms
Max     time: 30.078 ms
Perf:         1834.643 GFLOP/s
Mean Relative Error compared to vendor BLAS: 3.045843413929106e-06

I suspect an issue with glibc OpenMP. (MKL-DNN is linked to Intel OpenMP)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions