Skip to content

Commit db1413f

Browse files
committed
Update README.md
1 parent d55c23b commit db1413f

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

README.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ OpenBLAS
44
This is a fork of the [OpenBLAS](https://github.com/OpenMathLib/OpenBLAS) and has been improved performance for the FUJITSU A64FX processor.
55

66
The following routine is tuned for A64FX.
7+
78
* SGEMM
9+
* DGEMM
810

911
# Prerequisites
1012

@@ -72,17 +74,19 @@ It is recommended to use large pages for performance.
7274
For Technical Computing Suite environments, HPC extension large page library can be used
7375
by adding `-L/opt/FJSVxos/mmm/lib64 -lmpg -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds ` to the options. Specify this option before any other libraries.
7476
```
75-
gcc a.c -L/opt/FJSVxos/mmm/lib64 -lmpg -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds -I$INSTALL_PATH/include -L$INSTALL_PATH/lib -lopenblas
77+
$ gcc a.c -L/opt/FJSVxos/mmm/lib64 -lmpg -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds -I$INSTALL_PATH/include -L$INSTALL_PATH/lib -lopenblas
7678
```
7779

7880
# Performance
7981

80-
The OpenBLAS library in this product improves the performance of SGEMM as follows:
82+
The OpenBLAS library in this product improves the performance of SGEMM and DGEMM as follows:
8183

8284
| Library | Routine | Parameters | # of cores | Original OpenBLAS | OpenBLAS tuned for A64FX |
8385
|------------|---------|---------------------------|------------|-------------------|--------------------------|
8486
| Sequential | SGEMM | No Transpose, M=N=K=5000 | 1 core | 78 GFlops | 108 GFlops |
87+
| | DGEMM | No Transpose, M=N=K=5000 | 1 cores | 36 GFlops | 51 GFlops |
8588
| OpenMP | SGEMM | No Transpose, M=N=K=10000 | 12 cores | 827 GFlops | 1267 GFlops |
89+
| | DGEMM | No Transpose, M=N=K=10000 | 12 cores | 373 GFlops | 572 GFlops |
8690

8791
# Restrictions
8892

0 commit comments

Comments
 (0)