[ Aarch64 ] Paged Attention FP16 precision enablement. #29219

ashwins990 · 2025-02-28T17:32:45Z

This development is related to Feature Request : #26422

This PR enables f16 inference precision for Paged Attention operator and key-value cache precision as u8.

Updated :: Using Kleidi instead of ACL. Attaching the server bechmarking result on Graviton 3E - 64 cores. This shows the comparison of f16 performance with f32 [reference] precision.

abhijain1204fujitsu · 2025-03-03T03:46:48Z

@dmitry-gorokhov, kindly support to review the changes

dmitry-gorokhov · 2025-03-05T12:28:42Z

@ashwins990 @abhijain1204fujitsu Failed CI job is not connected with PR changes.
As we discussed lets proceed with KleidiAI based kernel for productization

ashwins990 · 2025-03-25T12:01:47Z

Hi, @dmitry-gorokhov.
Like Discussed, I have pushed Kleidi based implementation. Please look into it. Thanks !

src/plugins/intel_cpu/src/nodes/kernels/aarch64/pa_kernels.hpp

src/plugins/intel_cpu/src/nodes/kernels/kai/kleidi_kernel.hpp

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

maxnick · 2025-03-31T16:34:35Z

Are there any plans to extend specific tests?

ashwins990 · 2025-04-10T16:29:20Z

Are there any plans to extend specific tests?

Please find the Line coverage [ with lcov tool ] for f16 inferenece. Please let me know any further tests required. Output generated is good (manually tested).
executor_pa.pdf.pdf
kai.pdf
pagedAttn_cpp.pdf
transformation.pdf
transpose.pdf

@maxnick Is this fine?. Can we extend tests [ integrated within OpenVINO ] in next PR maybe, if required??

mlukasze · 2025-04-15T07:40:57Z

build_jenkins

src/plugins/intel_cpu/src/nodes/kernels/aarch64/sve_utils.hpp

src/plugins/intel_cpu/src/nodes/kernels/kai/kleidi_kernel.hpp

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

maxnick · 2025-04-15T09:48:47Z

Are there any plans to extend specific tests?

Please find the Line coverage [ with lcov tool ] for f16 inferenece. Please let me know any further tests required. Output generated is good (manually tested). executor_pa.pdf.pdf kai.pdf pagedAttn_cpp.pdf transformation.pdf transpose.pdf

@maxnick Is this fine?. Can we extend tests [ integrated within OpenVINO ] in next PR maybe, if required??

Thank you for the code coverage report. It looks like the newly added code is well covered with tests.

ashwins990 · 2025-04-18T06:32:19Z

@ashwins990 Hi, I’ve updated the macOS part in this PR. Could you please recheck the SVE section on your devices? Thanks!

@allnes, Yes, it works as expected. Thanks !

ashwins990 · 2025-04-21T05:58:41Z

Hi @allnes,
One of the test fails here TensorFlow Layer Tests
Is it because the computation is happening in f16 [ overflow happens ] and compared with f32 ?

...common/transformations/src/transformations/common_optimizations/convert_pagedattn_inputs.cpp

maxnick · 2025-04-28T08:39:38Z

build_jenkins

src/plugins/intel_cpu/src/nodes/paged_attn.cpp

ashwins990 · 2025-05-03T07:29:16Z

@maxnick I have resolved the merge conflicts. Thanks !

maxnick · 2025-05-05T08:24:09Z

build_jenkins

This development is related to Feature Request : #26422 This PR enables f16 inference precision for Paged Attention operator and key-value cache precision as u8. Updated :: Using Kleidi instead of ACL. Attaching the server bechmarking result on Graviton 3E - 64 cores. This shows the comparison of f16 performance with f32 [reference] precision. ![Kleidi-oss-result](https://github.com/user-attachments/assets/4080ad7c-8896-46ce-85b1-80b94664fc25) --------- Co-authored-by: Nesterov Alexander <[email protected]>

See openvinotoolkit/openvino#29219 CVS-164838

ashwins990 requested review from a team as code owners February 28, 2025 17:32

ashwins990 requested review from itikhono and removed request for a team February 28, 2025 17:32

github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Feb 28, 2025

sys-openvino-ci added the ExternalPR External contributor label Feb 28, 2025

dmitry-gorokhov self-assigned this Mar 3, 2025

p-wysocki linked an issue Mar 5, 2025 that may be closed by this pull request

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

Closed

1 task

ilya-lavrenov added the platform: arm OpenVINO on ARM / ARM64 label Mar 13, 2025

ashwins990 force-pushed the aarch64-pa-f16-ACL branch from 4f28ab0 to 03b4c51 Compare March 25, 2025 11:52

ashwins990 requested a review from a team as a code owner March 25, 2025 11:52

github-actions bot added the category: build OpenVINO cmake script / infra label Mar 25, 2025

ilya-lavrenov added this to the 2025.2 milestone Mar 25, 2025

mg-intel assigned alvoron and maxnick Mar 31, 2025

maxnick requested changes Mar 31, 2025

View reviewed changes

ashwins990 force-pushed the aarch64-pa-f16-ACL branch from 03b4c51 to 4885881 Compare April 10, 2025 15:53

ashwins990 requested a review from maxnick April 10, 2025 16:31

mlukasze unassigned dmitry-gorokhov Apr 15, 2025

itikhono requested a review from CuriousPanCake April 15, 2025 07:50

maxnick reviewed Apr 15, 2025

View reviewed changes

maxnick requested changes Apr 22, 2025

View reviewed changes

...common/transformations/src/transformations/common_optimizations/convert_pagedattn_inputs.cpp Outdated Show resolved Hide resolved

maxnick reviewed Apr 22, 2025

View reviewed changes

...common/transformations/src/transformations/common_optimizations/convert_pagedattn_inputs.cpp Outdated Show resolved Hide resolved

ashwins990 force-pushed the aarch64-pa-f16-ACL branch from c2d65fe to 78d52cb Compare April 27, 2025 17:30

github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Apr 27, 2025

ashwins990 requested a review from maxnick April 27, 2025 17:35

maxnick reviewed Apr 28, 2025

View reviewed changes

src/plugins/intel_cpu/src/nodes/paged_attn.cpp Show resolved Hide resolved

maxnick approved these changes Apr 30, 2025

View reviewed changes

ashwins990 and others added 7 commits May 3, 2025 12:35

Aarch64-pa-f16-Kleidi

275c75e

modified sve_utils, added Copyright info, rebased

a4b54cc

Code Refactored, rebased

a034966

Fix the macOS section to enable SVE instructions

eb76cf7

Update convert_pagedattn_inputs transfprmation (disabled sve for apple))

294bbbc

Removed forced convert of Kv_cache precision for PA

45874d2

Merge conflict resolved, refactored code

0a1bc22

ashwins990 force-pushed the aarch64-pa-f16-ACL branch from 78d52cb to 0a1bc22 Compare May 3, 2025 07:24

maxnick added this pull request to the merge queue May 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 5, 2025

maxnick added this pull request to the merge queue May 6, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 6, 2025

maxnick added this pull request to the merge queue May 6, 2025

Merged via the queue into openvinotoolkit:master with commit 1cc0683 May 6, 2025
190 checks passed

ilya-lavrenov mentioned this pull request May 6, 2025

Enables PA for arm64 openvinotoolkit/openvino.genai#2165

Merged

ilya-lavrenov added a commit to openvinotoolkit/openvino.genai that referenced this pull request May 6, 2025

Enables PA for arm64 (#2165)

60b40bf

See openvinotoolkit/openvino#29219 CVS-164838

[ Aarch64 ] Paged Attention FP16 precision enablement. #29219

[ Aarch64 ] Paged Attention FP16 precision enablement. #29219

Uh oh!

Conversation

ashwins990 commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhijain1204fujitsu commented Mar 3, 2025

Uh oh!

dmitry-gorokhov commented Mar 5, 2025

Uh oh!

ashwins990 commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxnick commented Mar 31, 2025

Uh oh!

ashwins990 commented Apr 10, 2025

Uh oh!

mlukasze commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxnick commented Apr 15, 2025

Uh oh!

ashwins990 commented Apr 18, 2025

Uh oh!

ashwins990 commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!

maxnick commented Apr 28, 2025

Uh oh!

Uh oh!

ashwins990 commented May 3, 2025

Uh oh!

maxnick commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

ashwins990 commented Feb 28, 2025 •

edited

Loading