File tree 1 file changed +6
-2
lines changed
1 file changed +6
-2
lines changed Original file line number Diff line number Diff line change 4
4
This is a fork of the [ OpenBLAS] ( https://github.com/OpenMathLib/OpenBLAS ) and has been improved performance for the FUJITSU A64FX processor.
5
5
6
6
The following routine is tuned for A64FX.
7
+
7
8
* SGEMM
9
+ * DGEMM
8
10
9
11
# Prerequisites
10
12
@@ -72,17 +74,19 @@ It is recommended to use large pages for performance.
72
74
For Technical Computing Suite environments, HPC extension large page library can be used
73
75
by adding ` -L/opt/FJSVxos/mmm/lib64 -lmpg -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds ` to the options. Specify this option before any other libraries.
74
76
```
75
- gcc a.c -L/opt/FJSVxos/mmm/lib64 -lmpg -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds -I$INSTALL_PATH/include -L$INSTALL_PATH/lib -lopenblas
77
+ $ gcc a.c -L/opt/FJSVxos/mmm/lib64 -lmpg -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds -I$INSTALL_PATH/include -L$INSTALL_PATH/lib -lopenblas
76
78
```
77
79
78
80
# Performance
79
81
80
- The OpenBLAS library in this product improves the performance of SGEMM as follows:
82
+ The OpenBLAS library in this product improves the performance of SGEMM and DGEMM as follows:
81
83
82
84
| Library | Routine | Parameters | # of cores | Original OpenBLAS | OpenBLAS tuned for A64FX |
83
85
| ------------| ---------| ---------------------------| ------------| -------------------| --------------------------|
84
86
| Sequential | SGEMM | No Transpose, M=N=K=5000 | 1 core | 78 GFlops | 108 GFlops |
87
+ | | DGEMM | No Transpose, M=N=K=5000 | 1 cores | 36 GFlops | 51 GFlops |
85
88
| OpenMP | SGEMM | No Transpose, M=N=K=10000 | 12 cores | 827 GFlops | 1267 GFlops |
89
+ | | DGEMM | No Transpose, M=N=K=10000 | 12 cores | 373 GFlops | 572 GFlops |
86
90
87
91
# Restrictions
88
92
You can’t perform that action at this time.
0 commit comments