Skip to content

Conversation

@Purfview
Copy link
Contributor

@Purfview Purfview commented Nov 5, 2025

Adds new VAD parameters:

min_silence_at_max_speech: Minimum silence duration in ms which is used to avoid abrupt cuts when max_speech_duration_s is reached.

use_max_poss_sil_at_max_speech: Whether to use the maximum possible silence at max_speech_duration_s or not. If not, the last silence is used.

"min_silence_at_max_speech" previously was hardcoded = 98 ms
"use_max_poss_sil_at_max_speech" helps to avoid abrupt cuts

Relevant links:
snakers4/silero-vad#664
snakers4/silero-vad#710

@Purfview Purfview changed the title Adds new VAD parameters Adds new VAD parameters + some VAD tweaks Nov 5, 2025
Adds new VAD parameters: 

min_silence_at_max_speech: Minimum silence duration in ms which is used to avoid abrupt cuts when max_speech_duration_s is reached.

use_max_poss_sil_at_max_speech: Whether to use the maximum possible silence at max_speech_duration_s or not. If not, the last silence is used.
Set minimum speech duration to zero for flexibility.
@MahmoudAshraf97 MahmoudAshraf97 merged commit ed9a06c into SYSTRAN:master Nov 19, 2025
3 checks passed
@Purfview
Copy link
Contributor Author

Purfview commented Nov 19, 2025

@MahmoudAshraf97 for what reason you set min_speech_duration_ms to 0, what do you mean by "flexibility"?

@MahmoudAshraf97
Copy link
Collaborator

because the batched pipeline is tuned for min_speech_duration_ms to 0, I wouldn't change it without testing WER on a longform dataset
the commit message and description are autogenerated and I didn't review them sorry for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants