Skip to content

Higher confidence scores for turn_completion only after 1200ms of silence at end of audio sample. #32

@atharvabagde

Description

@atharvabagde

I have deployed the smart-turn onnx model in rust and have been using it for detecting "SpeechEND" in my VAD service. By default the VAD service I am using (TEN VAD) has a fallback for SpeechEND after 800ms of silence. For all the input audio samples, the fallback is triggered while the smart_turn probability score is still at around 0.4 for end_of_turn.

Wanted to know if there is something wrong in my implementation or is there an average silence of around 1-1.2s at the end for all end_of_turn annotated samples.

PS:: Is there any drawback for using hard-coded 800ms of SpeechEND threshold. Please let me know of any suggestions/feedback you have on that

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions