First of all, thanks for the great work on this project! The voice quality is impressive, but the inference latency is quite high, even when running on an H100 GPU. This makes real-time applications difficult.
Would appreciate any guidance on reducing the latency. Thanks!