Description
I compare the RealtimeAPI with the browsers SpeechRecognition capabilities.
Does anyone else have the experience, that the RealTimeAPI STT is giving out babelfish?
In addition, though I set the language for RealtimeAPI to "de"
it won't focus on recognizing the language in German. This is why we get interesting speakings in other languages.
Here are examples:
Actual: Asking for the weather in German: "Wie ist das Wetter in Berlin?"
It takes quite a fight.
That's not how we do it.
Pa pa, super spada
수고하셨습니다.
I don't know what I'm doing with my life.
I'm a bit worried about GPT.
To compare, the SpeechRecognition from the browser.
denkst du, du bist im moment ne?
aber thomas anrufen?
hallo.
Correct wake word
auch ernsthaft.
Actual "Ach, ernsthaft!"
The browser's speech recognition is much closer to what is actually said.
And I think the issue, why the LLM doesn't always understand one is because the STT creates a babelfish.
Has anyone else made the experience?