Thanks a lot for your work of Qserve.
Now, we want to deploy the Qserve based on H100 GPU(which FP8 tensor core is supported), considering both the accuracy and throughput.
Do you have any suggestion of revision for Qserve? Or do you consider to do some optimization on H100?