Skip to content

GEMM performance: OpenBLAS vs Accelerate #132

Description

@xwuupb

The performance gap between OpenBLAS and Accelerate is evident in the measured GEMM results presented here. I believe the main reason for this large difference is that OpenBLAS performs GEMM computations on CPU cores, whereas Accelerate may utilize the SME engine for GEMM. This can be verified through a simple experiment: increase the number of threads in the benchmark code to 2, 3, and 4. For OpenBLAS, I observed a proportional increase in performance. In contrast, the performance of Accelerate remains unchanged as the number of threads increases. Therefore, I think OpenBLAS and Accelerate use different execution units on Apple silicon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions