Question to review my understanding of the operation process for RealTimeSTT text function #214
Replies: 1 comment 12 replies
-
Audio enters in two ways:
Once in the queue, the Then, the
|
Beta Was this translation helpful? Give feedback.
-
The code for [Text Function] below refers to the previous code audio_recorder.py and creates a callback function, creates a thread, and applies in/out parameters. Please refer to the attached audio_recorder.py for my code. (It may be a little different from the latest version because I referred to your old version, but the voice streaming and stt transcription operation codes will probably be similar.) The text function is called in the main code of Thomas_audio_control_src.py.
realtimestt.zip
text, thomas_event_state = recorder.text(utils.main_process, start_time, communicator, similarity_cal, params.similarity_config)
In other words, I implemented a structure that uses the text function to perform voice streaming and STT transcription.
However, since I do not have professional knowledge about voice streaming, it is too difficult to customize the process. So I'm writing this because I want to review if I understand the process of voice streaming and STT transcription correctly. Please read the article and if there is any part where the analysis is wrong or you can advise me to understand it better, I would appreciate it. I want to gain more professional knowledge about voice streaming.
I think it would be appreciated if you could tell me the sequence. How does audio come in through the microphone, and how does data processing occur in which thread, and how does voice stream and STT transcription occur?
[Analyzed content]
-Open pyaudio’s interface stream
Beta Was this translation helpful? Give feedback.
All reactions