Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
i am looking for the provider with the best diarization for omi stt. so far, we have: Deepgram, Speechmatics (the list will keep going).
some rules:
no large dataset here; i'll keep things grounded and won't reference any benchmarks from the provider.
to be fair, i'll use the best model at the time of the provider.
legend: * means error, input: the audio that is hitting the provider's server, der: diarization error rate
why?
i don't aim to build a new business or labs to outdo the stt providers out there. diarization is hard, it's costly and time-consuming. i'd like to focus on vertical integration, leveraging the strengths of each provider to bring value to omi users.
results:
20251222:
with the clean version of the audio - good audio quality scenarios.
tldr: deepgram der 51/122, speechmatics der 1/122
input: do_llms_understand_ai_pioneer_yann_lecun_spars_with_deepminds_adam_brown_cut.wav.zip (source: https://www.youtube.com/watch?v=ykfQD1_WPBQ&t=1115s)
code: speechmatics_rt_sdk.py deepgram_rt_sdk.py
speechmatics: en, monolingual, enhanced: der 1/122
--
[S1]: So
[S1]: obviously
[S1]: we're
[S1]: missing
[S1]: something
[S1]: very
[S1]: big
[S1]: to get
[S1]: machines
[S1]: to the
[S1]: level
[S1]: of
[S1]: human or
[S1]: even
[S1]: animal
[S1]: intelligence. Well,
[S1]: let's not talk
[S1]: about
[S1]: language. Let's
[S1]: talk about
[S1]: how a
[S1]: cat is
[S1]: intelligent
[S1]: or a
[S1]: dog.
[S1]: Um, we're not even
[S1]: at that
[S1]: level
[S1]: with
[S1]: AI
[S1]: systems.
[S2]: Adam,
[S2]: you
[S2]: you
[S2]: impart
[S2]: more
[S2]: comprehension
[S2]: on
[S2]: the part
[S2]: of
[S2]: the
[S2]: llms
[S2]: at this
[S2]: point
[S2]: already.
[S3]: I
[S3]: think that's
[S3]: right. So,
[S3]: I mean,
[S3]: Yann
[S3]: is
[S3]: making some
[S3]: excellent
[S3]: points
[S3]: that
[S3]: the
[S3]: Llms are
[S3]: much
[S3]: less,
[S3]: for
[S3]: example,
[S3]: sample
[S3]: efficient
[S3]: than
[S3]: humans,
[S3]: humans,
[S3]: or
[S3]: indeed
[S3]: your
[S3]: cat or
[S3]: just
[S3]: a cat.
[S3]: I don't know if
[S3]: it was your
[S3]: cat or
[S3]: a.
[S2]: Very
[S2]: smart
*[S2]: cat. In
[S3]: your
[S3]: example,
[S3]: um
[S3]: is
[S3]: able
[S3]: to
[S3]: learn
[S3]: from
[S3]: many
[S3]: fewer
[S3]: examples
[S3]: than
deepgram: en, monolingual, nova-3 der 51/122
--
[S0]: So, obviously, we're missing something
[S0]: very big,
[S0]: to get machines to the level of
[S0]: human or even animal intelligence. Let's not talk about la
nguage. Let's talk about how a cat is intelligent.
[S0]: Or a dog.
[S0]: We're not even at that level with AI systems.
*[S0]: Mhmm.
[S1]: Adam, you you you
[S1]: impart more comprehension on
[S1]: the part of the LLMs at this point.
[S1]: Already?
*[S0]: I think that's right. So, I mean, Jan
*[S0]: is making some excellent points that the LLMs
*[S0]: are much less
*[S0]: example, sample efficient
[S2]: than humans. Humans or indeed
*[S2]: your cat or just a a cat. I don't know if it was your cat
or It's very smart cat.
[S2]: In your example, is able to
[S2]: learn from
[S2]: many fewer examples
[S2]: than
[S2]: a
20251221:
audio input goes through the mic before reaching the provider's server - low audio quality scenarios.
tldr: deepgram der 61/122, speechmatics der 3/122
input: recording_20251221_151217.wav.zip (source: https://www.youtube.com/watch?v=ykfQD1_WPBQ&t=1115s)
code: [speechmatics_rt_sdk.py](https://github.com/user-attachments/files/24277548/speechmatics_rt_sdk.py deepgram_rt_sdk.py
speechmatics: en, monolingual, enhanced der 3/122
--
[S1]: Obviously we're
[S1]: missing
[S1]: something
[S1]: very
[S1]: big
[S1]: to
[S1]: get
[S1]: machines to
[S1]: the
[S1]: level
[S1]: of
[S1]: human or
[S1]: even
[S1]: animal
[S1]: intelligence.
[S1]: Well, let's
[S1]: not talk
[S1]: about
[S1]: language. Let's
[S1]: talk about
[S1]: how a
[S1]: cat is
[S1]: intelligent
[S1]: or a
[S1]: dog.
[S1]: Um,
[S1]: we're not
[S1]: even at
[S1]: that
[S1]: level
[S1]: with
[S1]: AI
[S1]: systems.
*[S1]: Mhm.
[S2]: Adam,
[S2]: you
[S2]: you
[S2]: impart
[S2]: more
[S2]: comprehension
[S2]: on
[S2]: the part
[S2]: of
[S2]: the
[S2]: llms
[S2]: at this
[S2]: point
[S2]: already.
[S3]: I
[S3]: think that's
[S3]: right. So
[S3]: I mean
[S3]: , Yann
[S3]: is
[S3]: making some
[S3]: excellent
[S3]: points
[S3]: that
[S3]: the
[S3]: Llms are
[S3]: much
[S3]: less,
[S3]: for
[S3]: example,
[S3]: sample
[S3]: efficient
[S3]: than
[S3]: humans,
[S3]: humans
[S3]: or
[S3]: indeed
[S3]: your
[S3]: cat or
[S3]: just
[S3]: a cat. I
[S3]: don't know if
[S3]: there's your
[S3]: cat or
*[S3]: it's a.
[S2]: Very
[S2]: smart
[S2]: cat.
[S3]: In your
[S3]: example,
[S3]: um
[S3]: is
[S3]: able
[S3]: to
[S3]: learn
[S3]: from
[S3]: many
[S3]: fewer examples that.
deepgram: en, monolingual, nova-3 der 61/122
--
[S0]: Very big.
[S0]: To get machines to the level of
[S0]: human or even animal intelligence. Right? Let's not talk about language. Let's talk about how a cat is intelligent.
[S0]: Or a dog. We're we're not even at that level with AI systems.
[S1]: Adam, you you you
*[S0]: impart more comprehension on
[S1]: the part of the LLMs at this point.
[S1]: Already.
*[S0]: I think that's right. So I mean in Jan,
*[S0]: is making some excellent points that the LLMs
*[S0]: are much less
*[S0]: for example, sample efficient
*[S0]: than humans.
*[S0]: Humans or indeed
*[S0]: your cat or just a a cat? I don't know if it's your cat or anything. It's a very smart cat.
*[S0]: Is able to learn from many fewer examples.
*[S0]: Than
Beta Was this translation helpful? Give feedback.
All reactions