Skip to content

What does Speech-To-Text and can I trust it? #101

Open
@BirgitPohl

Description

@BirgitPohl

I compare the RealtimeAPI with the browsers SpeechRecognition capabilities.

Does anyone else have the experience, that the RealTimeAPI STT is giving out babelfish?
In addition, though I set the language for RealtimeAPI to "de" it won't focus on recognizing the language in German. This is why we get interesting speakings in other languages.

Here are examples:

Actual: Asking for the weather in German: "Wie ist das Wetter in Berlin?"

It takes quite a fight.

That's not how we do it.

Pa pa, super spada

수고하셨습니다.

I don't know what I'm doing with my life.

I'm a bit worried about GPT.

To compare, the SpeechRecognition from the browser.

denkst du, du bist im moment ne?

aber thomas anrufen?

hallo.

Correct wake word

auch ernsthaft.

Actual "Ach, ernsthaft!"

The browser's speech recognition is much closer to what is actually said.
And I think the issue, why the LLM doesn't always understand one is because the STT creates a babelfish.

Has anyone else made the experience?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions