[Feature Request]: Optimize PagedAttention operation on aarch64 HW

### Request Description

PagedAttention operation is already [implemented](https://github.com/openvinotoolkit/openvino/blob/9c41f10ef4c472b7434da62931c71201732fb3f8/src/plugins/intel_cpu/src/nodes/paged_attn.cpp) in bounds of CPU plugin using C++ and [optimized](https://github.com/openvinotoolkit/openvino/blob/9c41f10ef4c472b7434da62931c71201732fb3f8/src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp) for x64 using avx2/avx512 instrinsics.  
The request is to optimize PA operation for aarch64 using NEON/SVE extensions. 

Please refer to [SDPA optimization using NEON](https://github.com/openvinotoolkit/openvino/pull/25348) for reference.
How to build OV on ARM: https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build.md

### Feature Use Case

PagedAttention operation implements attention algo required for workloads like continuous batching or speculative decoding. PagedAttention is used as basic attention block in VLLM OpenVINO backend and under OpenVINO GenAI API (for some use-cases). PA operation might take significant resources for execution (especially for long contexts), so its optimization is crucial for overall LLM based workloads. 

### Issue submission checklist

- [X] The feature request or improvement must be related to OpenVINO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

Request Description

Feature Use Case

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

Description

Request Description

Feature Use Case

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions