First of all, thank you for sharing your models!
I was wondering if you trim the audio from silince during preprocessing, because your model works pretty well where there's voice right away but if someone was lingering in the beginning of the audio, your model predicts noise. How do you think one should approach the issue?