You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Improved `int8` matmul performance with zero-points support for source and weight tensors.
17
+
* Improved matmul and reorder performance for 4-bit floating-point data types `f4_e2m1` and `f4_e3m0`. Compute primitives provide support through internal converison into f16 as current Intel GPUs lack native support.
17
18
* Improved performance of the following subgraphs with Graph API:
18
19
* Scaled Dot Product Attention (SDPA) with `int4` and `int8` KV cache.
19
20
* SDPA with bottom-right implicit causal mask.
20
-
* SDPA with head size between 257 and 512.
21
+
* SDPA with head size 512 and 576.
21
22
* Grouped Query Attention (GQA) with 5D input tensors.
22
23
23
24
## AArch64-based Processors
24
25
* Enabled BF16 forward-mode inner product via ACL and improve perfomance for BERT and AlexNet in torch compile-mode.
25
26
* Preferential use of jit_sve conv where faster.
26
27
27
28
# Functionality
28
-
## Common
29
-
* Introduced select algorithm support in [binary primitive](https://uxlfoundation.github.io/oneDNN/v3.8/dev_guide_binary.html). The functionality is implemented on CPUs and Intel GPUs.
30
-
31
29
## Intel Graphics Products
32
30
* Introduced support for the [GenIndex](https://oneapi-src.github.io/oneDNN/v3.8/dev_guide_op_genindex.html) operation in Graph API.
31
+
* Introduced select algorithm support in [binary primitive](https://uxlfoundation.github.io/oneDNN/v3.8/dev_guide_binary.html). The functionality is optimized for Intel GPUs.
32
+
* Introduced optimized support for 4-bit floating-point data types `f4_e2m1` and `f4_e3m0` in convolution on Intel(R) Data Center GPU Max Series or newer Intel GPUs.
33
+
* Extended support for 4-bit floating-point data types in matmul and reorder.
33
34
34
35
## Intel Architecture Processors
35
36
* Introduced support for `f32` convolution with `fp16` compressed weights.
0 commit comments