You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To have a realtime Vietnamese - English transaltion demo
Current demo state: v0.2
Backlog
ASR Quality and Speed (do some hacking like @new5558 did with existing checkpoint or finetune new model with better data?)
Small bench voice for ASR for Vietnamese, Thai (low resource language), as suggested by @Yip-Jia-Qi in real life environment (coffeeshop, home, record by laptop mic, webcam mic, phone mic,...)
VAD @thinhlpg or streaming @new5558 (I'm following the VAD path, the problem is that VAD sometime suck and missing some speech at the start of the speech)
Serving model for optimial latency, currently took 6s from speech start to TTS (the demo wasn't optimized at all, lot of room for improvement). For context, Google speech translate demo 4 samples and all took exact 2 secs as I inspect manually https://youtu.be/hyXqcsWOONo?feature=shared)
Traning experiemnts for translation: better data, hyperparameters?
How good is the commercial baseline for ASR?
Know your enemy - we want to beat the commercial ones, not just opensource
Test Zalo Kiki (the ASR part).
Test other popular options
=> Their reponse time, their output quality?
🎯 Goal
Backlog
How good is the commercial baseline for ASR?
Know your enemy - we want to beat the commercial ones, not just opensource
=> Their reponse time, their output quality?