Higher confidence scores for turn_completion only after 1200ms of silence at end of audio sample.

I have deployed the smart-turn onnx model in rust and have been using it for detecting "SpeechEND" in my VAD service. By default the VAD service I am using (TEN VAD) has a fallback for SpeechEND after 800ms of silence. For all the input audio samples, the fallback is triggered while the smart_turn probability score is still at around 0.4 for end_of_turn.

Wanted to know if there is something wrong in my implementation or is there an average silence of around 1-1.2s at the end for all end_of_turn annotated samples.

PS:: Is there any drawback for using hard-coded 800ms of SpeechEND threshold. Please let me know of any suggestions/feedback you have on that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Higher confidence scores for turn_completion only after 1200ms of silence at end of audio sample. #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Higher confidence scores for turn_completion only after 1200ms of silence at end of audio sample. #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions