Skip to content

Fast vectorized gemm to take full usage of YMM Registers by SIMD instructions.

Notifications You must be signed in to change notification settings

XiaomingXu1995/gemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

The vectorization idea of gemm by SIMD instructions comes from the zhihu (https://zhuanlan.zhihu.com/p/383115932) and the (https://github.com/pigirons/sgemm_hsw).

zhihu gives a detailed description of the methods with perspicuous pictures.

Build

make -j8

Init the input

./init.sh

This is used for initialization of the input elements (Integer and Float values). Input matrices of A[m][n] and B[n][k] are read from the *.random files. Make sure the m*n and n*k are less than the element number of .random files.

Run the gemm

./exe_gemm_float m n k res It means that C[m][k]=A[m][n]xB[n][k]. As is shown in zhihu, the n should be a multiple of 24 to fully use the 16 ymm logical registers.

For example:
./exe_gemm_float 2400 2400 2400 res

./exe_gemm_float_multiple 24 24 64 res

./run.sh

About

Fast vectorized gemm to take full usage of YMM Registers by SIMD instructions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages