I have deployed the smart-turn onnx model in rust and have been using it for detecting "SpeechEND" in my VAD service. By default the VAD service I am using (TEN VAD) has a fallback for SpeechEND after 800ms of silence. For all the input audio samples, the fallback is triggered while the smart_turn probability score is still at around 0.4 for end_of_turn.
Wanted to know if there is something wrong in my implementation or is there an average silence of around 1-1.2s at the end for all end_of_turn annotated samples.
PS:: Is there any drawback for using hard-coded 800ms of SpeechEND threshold. Please let me know of any suggestions/feedback you have on that