Thank you for your impressive work.
I’d like to ask whether you have any plans to optimize performance on the B200 (sm100), for example by extending support beyond FP4 to better leverage Tensor Memory and 5th‑generation Tensor Core (UMMA) instructions.