A few notes on DGEMM vs. OpenBLAS

We wanted to get an idea of what might be different between DGEMM from gh-146, which is brute force "for loops," and the faster-performing version in OpenBLAS.

From a quick scan of the OpenBLAS code base at the time of writing, there are a few relevant things to note:
- OpenBLAS uses hand crafted assembly for many architectures and algorithms, for example `kernel/x86_64/dgemm_kernel_4x8_haswell.S` has 5000 lines of assembly for `DGEMM` stuff--so that's obviously something that might be optimized at a different level of "tuning" vs. us at the moment; I'm not sure how easy it will be for me to read the assembly and check for specific types of optimizations like pipelining and so on..
- they also seem to be able to leverage i.e., `cuda_dgemm_kernel` so it may make sense to compare with them on the GPU with a specific compilation of OpenBLAS for that scenario (though this is less convenient for the current benchmarks, because SciPy is not GPU-swappable off the shelf; could look at CuPy CuBLAS or some other Python interface for comparison maybe)
- `git grep -E -i "strassen"` returns no results in OpenBLAS, so the usage of an algorithm with a fundamentally different asymptotic behavior is perhaps a bit less likely; also, see related discussion agreeing with this: https://stackoverflow.com/a/11421344/2942522
  - in short, it seems like Strassen may have substantial algorithm coefficients/caching issues and maybe even numerical stability issues that prevent it from being the primary choice despite the asymptotic advantages (this may also be why I don't think it is even mentioned in the IEEE paper we were looking at..)

Not sure how helpful all of this is, but my initial impression is that low-level optimizations in assembly for specific architectures drive a lot of the improvements, rather than fancier/asymptotically-superior algorithms that are far more complex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few notes on DGEMM vs. OpenBLAS #159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A few notes on DGEMM vs. OpenBLAS #159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions