how to achieve the near-realtime update just like in demo #190

Mika24 · 2025-01-26T09:51:30Z

Mika24
Jan 26, 2025

hi everyone, i am very interested on how to achieve something like the demo video. Right now, I wrote a client-side code with pyaudio to collect audio. The way i did it is to send the entire audio chunk after no sound detected. So if I keep speaking, it acts more like taking a long time to transcribe, other than what it's shown in the demo video (updating ). And I went through all the code under "/tests" folder and found nothing.

Much apprecidated if you can walk me through have the some demo code about how

client and server are interacting
how the client updates final transcribtion and how to keep the interim ressults before final update

KoljaB · 2025-01-26T15:32:50Z

KoljaB
Jan 26, 2025
Maintainer

The demo video usesthis file: realtimestt_test.py.

To get near real-time performance:

Ensure CUDA is installed and properly configured with PyTorch:

Install the necessary packages using:

pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

(adjust 121 to your cuda version, I recommend CUDA 12.1)

Verify the installation by running:

python -c "import torch; print('CUDA is available' if torch.cuda.is_available() else 'CUDA is not available')"

Use a fast GPU: The demo was created using an NVIDIA RTX 4090, which significantly boosts performance.

Regarding your specific questions:

How do the client and server interact?

In RealtimeSTT they communicate via WebSockets.

How does the client update the final transcription and maintain interim results before the final update?

When a final transcription request is detected, it's handled in a separate process to keep the main process responsive. The client doesn't manage the updates directly, instead the server determines when to perform the final transcription and sends both real-time and final transcription updates back to the client.

11 replies

Jaid844 Feb 22, 2025

Hi @KoljaB ,
I am running issue while exposing the websockets in runpod ?
The server runs fine in local and connects fine, but connecting in runpod gives issue . Any tips !

KoljaB Feb 22, 2025
Maintainer

Which issues? Did you make sure to tell the pod which ports it should open?

Jaid844 Feb 22, 2025

I think I know I found the issue ! Thanks for the reply.

Jaid844 Feb 23, 2025

Thanks for the answer I have solved the issue. I also need your opinion on voice inference. Have you worked on any projects where you were able to solve the voice latency inference problem?

KoljaB Feb 23, 2025
Maintainer

Yes, RealtimeTTS project aims to solve voice latency. It supports multiple TTS engines, latency depends on the engine. You can get a time to first audio token of ~200ms with CoquiEngine and StyleEngine. With KokoroEngine around 100ms. Here's an implementation with StyleTTS2 on modal to test (keep in mind that in conversational voice systems the speaker turn waiting time and the time to first LLM token add up to the pure TTS latency).

francescocassini · 2025-02-20T10:14:02Z

francescocassini
Feb 20, 2025

Yes, KoljaB is the absolute best between transcribers... I have tested several one and his implementation is faster than all the others.
I will study Audiorecorder, but I would to know which is the difference between the code in example_browserclient folder respect to RealtimeSTT_server. It seems me that the example_browserclient is very fast and complete ad it uses the same websockets mechanism to transmit fast the audio.

I think Moshi has implemented complete different pipeline with an LLM voice_token activated instead of using text_token activation as Whisper... But I have to read their paper.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to achieve the near-realtime update just like in demo #190

{{title}}

Replies: 2 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how to achieve the near-realtime update just like in demo #190

Mika24 Jan 26, 2025

Replies: 2 comments · 11 replies

KoljaB Jan 26, 2025 Maintainer

Jaid844 Feb 22, 2025

KoljaB Feb 22, 2025 Maintainer

Jaid844 Feb 22, 2025

Jaid844 Feb 23, 2025

KoljaB Feb 23, 2025 Maintainer

francescocassini Feb 20, 2025

Mika24
Jan 26, 2025

Replies: 2 comments 11 replies

KoljaB
Jan 26, 2025
Maintainer

KoljaB Feb 22, 2025
Maintainer

KoljaB Feb 23, 2025
Maintainer

francescocassini
Feb 20, 2025