Skip to content

Bidi streaming proposal end of utterance detection #13

@seyuf

Description

@seyuf

Hi,

Much thanks for this awesome work!
I have a use case deriving from my use of the project. And I thought it was worth exposing here, as it believe it can be implemented directly on the main branch.

If i've already implemented some kind of PoC or v1 here.

The idea would be to, add silence/ end of utterance detection to the server.
Today, what i observe is that in bidistreaming, the server is transcribing indefinitely streams of messages sent from the client. Appending the results at each iteration.
So if one wants to reset (the result), one is forced to kill the connection, from the client.

What i made in the above link is kinda similar, i just send from the client side in the audio config message end_of_utterance value, which tells the server im done. Send me the last result and close the connection. I also set in the last result massage, some is_final value signalling that this is the last result from the server and that the connection has been closed to the client.
Although this works, it is not very satisfying, as to me the right thing would be the keep the connection alive but just reset the results when an utterance has ended. I also believe that the server could also do the end of utterance detection using silence detection.

The idea would be to consider that was at the end of an utterance, if we receive silent audio for some amount of time or iteration (the code seems already in place here)
So:

  1. client specify in the message /audio config if it would like the server to detect the end of utterances. (if not we keep the current behaviour)
  2. Client sends streams of messages
  3. After multiple consecutives empty audio decoding the server decides, we're at an end of utterance
  4. Server send back result with ( is_final set to true in the response message).
  5. Server reset data, but keeps connection alive (or may be killing it? Could be optional), waiting for new input from client.

I hope it the understandable enough? If so i would like some feedback, if possible?

Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions