🚀 The feature, motivation and pitch
Currently, the FP8 KV cache feature (in the FlashMLA interface) only supports per-tensor (scalar) scaling factors. Are you developing support for finer-grained scaling factors (e.g., per-channel)? If so, when can we expect the FP8 KV cache with such finer-grained scaling factors to be completed?
Alternatives
No response
Additional context
No response
Before submitting a new issue...