Skip to content

Commit 7cbff54

Browse files
committed
doc: more updates for Intel GPUs in RELEASE_NOTES.md
1 parent 9140d64 commit 7cbff54

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

RELEASE_NOTES.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,23 @@
1414
* Intel Arc Graphics for Intel Core Ultra (Series 2, formerly Lunar Lake).
1515
* Intel Arc B-series discrete graphics (formerly Battlemage).
1616
* Improved `int8` matmul performance with zero-points support for source and weight tensors.
17+
* Improved matmul and reorder performance for 4-bit floating-point data types `f4_e2m1` and `f4_e3m0`. Compute primitives provide support through internal converison into f16 as current Intel GPUs lack native support.
1718
* Improved performance of the following subgraphs with Graph API:
1819
* Scaled Dot Product Attention (SDPA) with `int4` and `int8` KV cache.
1920
* SDPA with bottom-right implicit causal mask.
20-
* SDPA with head size between 257 and 512.
21+
* SDPA with head size 512 and 576.
2122
* Grouped Query Attention (GQA) with 5D input tensors.
2223

2324
## AArch64-based Processors
2425
* Enabled BF16 forward-mode inner product via ACL and improve perfomance for BERT and AlexNet in torch compile-mode.
2526
* Preferential use of jit_sve conv where faster.
2627

2728
# Functionality
28-
## Common
29-
* Introduced select algorithm support in [binary primitive](https://uxlfoundation.github.io/oneDNN/v3.8/dev_guide_binary.html). The functionality is implemented on CPUs and Intel GPUs.
30-
3129
## Intel Graphics Products
3230
* Introduced support for the [GenIndex](https://oneapi-src.github.io/oneDNN/v3.8/dev_guide_op_genindex.html) operation in Graph API.
31+
* Introduced select algorithm support in [binary primitive](https://uxlfoundation.github.io/oneDNN/v3.8/dev_guide_binary.html). The functionality is optimized for Intel GPUs.
32+
* Introduced optimized support for 4-bit floating-point data types `f4_e2m1` and `f4_e3m0` in convolution on Intel(R) Data Center GPU Max Series or newer Intel GPUs.
33+
* Extended support for 4-bit floating-point data types in matmul and reorder.
3334

3435
## Intel Architecture Processors
3536
* Introduced support for `f32` convolution with `fp16` compressed weights.

0 commit comments

Comments
 (0)