Skip to content

Conversation

@danwhittaker-arm
Copy link
Contributor

Description

This PR is about enabling int8 brgconv for SVE128. Previously they where being routed to gemm_s8s8s32:ref or gemm_s8u8s32:ref, and now they are handled by brgconv:sve_128 or brgconv_1x1:sve_128 resulting in a significant improvement, as shown in the Performance Improvements section below.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?

All the NIGHTLY tests are passing when the command below is executed:
ctest -j$(nproc) -E $(../.github/automation/aarch64/skipped-tests.sh)

  • Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements?
OMP_NUM_THREADS=16 ./tests/benchdnn/benchdnn --conv --mode=p --batch=harness_conv_int8

On a Graviton4, without this PR:

============================================================
= Implementation statistics (--summary=no-impl to disable) =
============================================================
|    gemm_s8s8s32:ref : 377 (38%)                          |
|    gemm_s8u8s32:ref : 374 (38%)                          |
|     brgconv:sve_128 : 127 (13%)                          |
| brgconv_1x1:sve_128 : 117 (12%)                          |
============================================================
tests:995 passed:995 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total perf: min(ms):15950.3 avg(ms):16204.6
total: 3054.10s; create_pd: 0.04s (0%); create_prim: 0.10s (0%); fill: 0.81s (0%); execute: 16.30s (1%);

On Graviton4, with this PR:

============================================================
= Implementation statistics (--summary=no-impl to disable) =
============================================================
   gemm_s8u8s32:ref : 369 (37%)                          
    brgconv:sve_128 : 269 (27%)                          
   gemm_s8s8s32:ref : 188 (19%)                          
brgconv_1x1:sve_128 : 169 (17%)                          
============================================================
tests:995 passed:995 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total perf: min(ms):8340.52 avg(ms):8423.01
total: 3023.44s; create_pd: 0.07s (0%); create_prim: 0.12s (0%); fill: 0.77s (0%); execute: 8.42s (0%);

On average, performance improved by approximately 100%.

@github-actions github-actions bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 component:common labels Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:common platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant