Skip to content

Support for real time audio streaming using chunk transfer encoding for Whisper #1025

Open
@Shulyaka

Description

@Shulyaka

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

It would be nice to start data transfer as soon as it becomes available for the real-time voice recognition.
We already have a similar feature for tts: https://platform.openai.com/docs/guides/text-to-speech/streaming-real-time-audio
Please note, I am not saying that a transcript should be available before the speech ended. But I would like to start the data transfer earlier.

Additional context

The HTTP supports sending files in chunks without knowing the length in advance.
A WAV header does require the length, however 0xFFFFFFFF (i.e. max length) works fine with Whisper (I checked).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions