Could someone share more info - model sizes / quality - realtime factor - is it using VAD? meaning audio with a lot of silence processed faster - is it using only GPU or APU ?