Skip to content

A few notes on DGEMM vs. OpenBLAS #159

@tylerjereddy

Description

@tylerjereddy

We wanted to get an idea of what might be different between DGEMM from gh-146, which is brute force "for loops," and the faster-performing version in OpenBLAS.

From a quick scan of the OpenBLAS code base at the time of writing, there are a few relevant things to note:

  • OpenBLAS uses hand crafted assembly for many architectures and algorithms, for example kernel/x86_64/dgemm_kernel_4x8_haswell.S has 5000 lines of assembly for DGEMM stuff--so that's obviously something that might be optimized at a different level of "tuning" vs. us at the moment; I'm not sure how easy it will be for me to read the assembly and check for specific types of optimizations like pipelining and so on..
  • they also seem to be able to leverage i.e., cuda_dgemm_kernel so it may make sense to compare with them on the GPU with a specific compilation of OpenBLAS for that scenario (though this is less convenient for the current benchmarks, because SciPy is not GPU-swappable off the shelf; could look at CuPy CuBLAS or some other Python interface for comparison maybe)
  • git grep -E -i "strassen" returns no results in OpenBLAS, so the usage of an algorithm with a fundamentally different asymptotic behavior is perhaps a bit less likely; also, see related discussion agreeing with this: https://stackoverflow.com/a/11421344/2942522
    • in short, it seems like Strassen may have substantial algorithm coefficients/caching issues and maybe even numerical stability issues that prevent it from being the primary choice despite the asymptotic advantages (this may also be why I don't think it is even mentioned in the IEEE paper we were looking at..)

Not sure how helpful all of this is, but my initial impression is that low-level optimizations in assembly for specific architectures drive a lot of the improvements, rather than fancier/asymptotically-superior algorithms that are far more complex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions