You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve f16 gemm gfx1250-gluon performance. Improves
gemm_tdm_pipelined_single_warp_per_simd_schedule_kernel by moving
tdm.load earlier; from the top of the loop (which hides 3/4th of a
loop-iteration's worth of cycles) to right after the wait (which hides a
full loop-iteration's worth of cycles).
This only fixes the mentioned kernel; other kernels need independent
benchmarking and improving.
0 commit comments