Skip to content

Batch inference from a diverse list of audio sources for better GPU utilization #1445

@MajaSoure

Description

@MajaSoure

I'm really surprised that the "transcribe" method doesn't support running with a list of audio files/tensors. I haven't found a ready-made implementation of this yet. I'm really curious whether I simply missed it or if it really doesn't exist. How can I maximize the GPU to process a list of diverse audio files: from short to long, with different and previously unknown languages? Is this possible? Has anyone already tried implementing this algorithm? I see that I can work with segments of a single audio file. But I don't see an effective implementation with a list without hacks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions