What is your question?
Hi.
I am using cutlass to compute fp16 gemm and fp8 per tensor scale gemm on sm1xx.
With cutlass old version, it seems A/B/C/D must be align to 128bit.
I would like to know the requirements with latest 4.3.2.
Thanks for your reply.