Skip to content

Conversation

@ashwins990
Copy link
Contributor

@ashwins990 ashwins990 commented Feb 28, 2025

This development is related to Feature Request : #26422

This PR enables f16 inference precision for Paged Attention operator and key-value cache precision as u8.

Updated :: Using Kleidi instead of ACL. Attaching the server bechmarking result on Graviton 3E - 64 cores. This shows the comparison of f16 performance with f32 [reference] precision.

Kleidi-oss-result

@ashwins990 ashwins990 requested review from a team as code owners February 28, 2025 17:32
@ashwins990 ashwins990 requested review from itikhono and removed request for a team February 28, 2025 17:32
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Feb 28, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Feb 28, 2025
@abhijain1204fujitsu
Copy link

@dmitry-gorokhov, kindly support to review the changes

@dmitry-gorokhov dmitry-gorokhov self-assigned this Mar 3, 2025
@dmitry-gorokhov
Copy link

@ashwins990 @abhijain1204fujitsu Failed CI job is not connected with PR changes.
As we discussed lets proceed with KleidiAI based kernel for productization

@p-wysocki p-wysocki linked an issue Mar 5, 2025 that may be closed by this pull request
1 task
@ilya-lavrenov ilya-lavrenov added the platform: arm OpenVINO on ARM / ARM64 label Mar 13, 2025
@ashwins990 ashwins990 requested a review from a team as a code owner March 25, 2025 11:52
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Mar 25, 2025
@ashwins990
Copy link
Contributor Author

Hi, @dmitry-gorokhov.
Like Discussed, I have pushed Kleidi based implementation. Please look into it. Thanks !

@ilya-lavrenov ilya-lavrenov added this to the 2025.2 milestone Mar 25, 2025
@maxnick
Copy link
Contributor

maxnick commented Mar 31, 2025

Are there any plans to extend specific tests?

@ashwins990
Copy link
Contributor Author

Are there any plans to extend specific tests?

Please find the Line coverage [ with lcov tool ] for f16 inferenece. Please let me know any further tests required. Output generated is good (manually tested).
executor_pa.pdf.pdf
kai.pdf
pagedAttn_cpp.pdf
transformation.pdf
transpose.pdf

@maxnick Is this fine?. Can we extend tests [ integrated within OpenVINO ] in next PR maybe, if required??

@ashwins990 ashwins990 requested a review from maxnick April 10, 2025 16:31
@mlukasze
Copy link
Contributor

build_jenkins

@maxnick
Copy link
Contributor

maxnick commented Apr 15, 2025

Are there any plans to extend specific tests?

Please find the Line coverage [ with lcov tool ] for f16 inferenece. Please let me know any further tests required. Output generated is good (manually tested). executor_pa.pdf.pdf kai.pdf pagedAttn_cpp.pdf transformation.pdf transpose.pdf

@maxnick Is this fine?. Can we extend tests [ integrated within OpenVINO ] in next PR maybe, if required??

Thank you for the code coverage report. It looks like the newly added code is well covered with tests.

@ashwins990
Copy link
Contributor Author

@ashwins990 Hi, I’ve updated the macOS part in this PR. Could you please recheck the SVE section on your devices? Thanks!

@allnes, Yes, it works as expected. Thanks !

@ashwins990
Copy link
Contributor Author

Hi @allnes,
One of the test fails here TensorFlow Layer Tests
Is it because the computation is happening in f16 [ overflow happens ] and compared with f32 ?

@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Apr 27, 2025
@ashwins990 ashwins990 requested a review from maxnick April 27, 2025 17:35
@maxnick
Copy link
Contributor

maxnick commented Apr 28, 2025

build_jenkins

@ashwins990 ashwins990 force-pushed the aarch64-pa-f16-ACL branch from 78d52cb to 0a1bc22 Compare May 3, 2025 07:24
@ashwins990
Copy link
Contributor Author

@maxnick I have resolved the merge conflicts. Thanks !

@maxnick
Copy link
Contributor

maxnick commented May 5, 2025

build_jenkins

@maxnick maxnick added this pull request to the merge queue May 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 5, 2025
@maxnick maxnick added this pull request to the merge queue May 6, 2025
github-merge-queue bot pushed a commit that referenced this pull request May 6, 2025
This development is related to Feature Request :
#26422

This PR enables f16 inference precision for Paged Attention operator and
key-value cache precision as u8.

Updated :: Using Kleidi instead of ACL. Attaching the server bechmarking
result on Graviton 3E - 64 cores. This shows the comparison of f16
performance with f32 [reference] precision.


![Kleidi-oss-result](https://github.com/user-attachments/assets/4080ad7c-8896-46ce-85b1-80b94664fc25)

---------

Co-authored-by: Nesterov Alexander <[email protected]>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 6, 2025
@maxnick maxnick added this pull request to the merge queue May 6, 2025
Merged via the queue into openvinotoolkit:master with commit 1cc0683 May 6, 2025
190 checks passed
ilya-lavrenov added a commit to openvinotoolkit/openvino.genai that referenced this pull request May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin ExternalPR External contributor platform: arm OpenVINO on ARM / ARM64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Optimize PagedAttention operation on aarch64 HW

9 participants