Replies: 2 comments 11 replies
-
The demo video usesthis file: realtimestt_test.py. To get near real-time performance:
Regarding your specific questions:
In RealtimeSTT they communicate via WebSockets.
When a final transcription request is detected, it's handled in a separate process to keep the main process responsive. The client doesn't manage the updates directly, instead the server determines when to perform the final transcription and sends both real-time and final transcription updates back to the client. |
Beta Was this translation helpful? Give feedback.
-
Yes, KoljaB is the absolute best between transcribers... I have tested several one and his implementation is faster than all the others. I think Moshi has implemented complete different pipeline with an LLM voice_token activated instead of using text_token activation as Whisper... But I have to read their paper. |
Beta Was this translation helpful? Give feedback.
-
hi everyone, i am very interested on how to achieve something like the demo video. Right now, I wrote a client-side code with pyaudio to collect audio. The way i did it is to send the entire audio chunk after no sound detected. So if I keep speaking, it acts more like taking a long time to transcribe, other than what it's shown in the demo video (updating ). And I went through all the code under "/tests" folder and found nothing.
Much apprecidated if you can walk me through have the some demo code about how
Beta Was this translation helpful? Give feedback.
All reactions