[QST] On sm1xx,  fp16 and fp8 per tensor scale gemm must be align to 128bit or not?

**What is your question?**
Hi.
I am using cutlass to compute fp16 gemm and fp8 per tensor scale gemm on sm1xx.
With cutlass old version, it seems A/B/C/D must be align to 128bit.
I would like to know the requirements with latest 4.3.2.

Thanks for your reply.