Skip to content

Conversation

@jondea
Copy link
Contributor

@jondea jondea commented Jan 9, 2026

Remove branch target alignment from brgemm functions, reducing code size by ~10% and improving performance on Graviton 3/4 by ~0.5% (this required many repeated measurements to be confident of results). Branch alignment can affect performance, but it does not need to be the default, and it should be machine and kernel specific and evidence based.

As another point of reference, arm_gemm in ComputeLibrary only uses nops in a single kernel a64_sgemm_asimd_8x12_a53, written for a core which we do not specifically target (and only then sparingly).

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements?

Remove branch target alignment from brgemm functions, reducing code size
by ~10% and improving performance on Graviton 3/4 by ~0.5% (this required
many repeated measurements to be confident of results). Branch alignment
can affect performance, but it does not need to be the default, and it
should be machine and kernel specific and evidence based.

As another point of reference, arm_gemm in ComputeLibrary only uses nops
in a single kernel `a64_sgemm_asimd_8x12_a53`, written for a core which
we do not specifically target (and only then sparingly).
@jondea jondea requested a review from a team as a code owner January 9, 2026 11:14
@github-actions github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant