I'm really surprised that the "transcribe" method doesn't support running with a list of audio files/tensors. I haven't found a ready-made implementation of this yet. I'm really curious whether I simply missed it or if it really doesn't exist. How can I maximize the GPU to process a list of diverse audio files: from short to long, with different and previously unknown languages? Is this possible? Has anyone already tried implementing this algorithm? I see that I can work with segments of a single audio file. But I don't see an effective implementation with a list without hacks.
I'm really surprised that the "transcribe" method doesn't support running with a list of audio files/tensors. I haven't found a ready-made implementation of this yet. I'm really curious whether I simply missed it or if it really doesn't exist. How can I maximize the GPU to process a list of diverse audio files: from short to long, with different and previously unknown languages? Is this possible? Has anyone already tried implementing this algorithm? I see that I can work with segments of a single audio file. But I don't see an effective implementation with a list without hacks.