Batch inference from a diverse list of audio sources for better GPU utilization

I'm really surprised that the "transcribe" method doesn't support running with a list of audio files/tensors. I haven't found a ready-made implementation of this yet. I'm really curious whether I simply missed it or if it really doesn't exist. How can I maximize the GPU to process a list of diverse audio files: from short to long, with different and previously unknown languages? Is this possible? Has anyone already tried implementing this algorithm? I see that I can work with segments of a single audio file. But I don't see an effective implementation with a list without hacks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch inference from a diverse list of audio sources for better GPU utilization #1445

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Batch inference from a diverse list of audio sources for better GPU utilization #1445

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions