Thanks for your great work! Could you share with me the inference time of generating 10s audio on CPU/GPU?