Skip to content

Vietnamese Translation #226

@thinhlpg

Description

@thinhlpg

🎯 Goal

  • To have a realtime Vietnamese - English transaltion demo
  • Current demo state: v0.2

Image

Backlog

  • ASR Quality and Speed (do some hacking like @new5558 did with existing checkpoint or finetune new model with better data?)
  • Small bench voice for ASR for Vietnamese, Thai (low resource language), as suggested by @Yip-Jia-Qi in real life environment (coffeeshop, home, record by laptop mic, webcam mic, phone mic,...)
  • VAD @thinhlpg or streaming @new5558 (I'm following the VAD path, the problem is that VAD sometime suck and missing some speech at the start of the speech)
  • Serving model for optimial latency, currently took 6s from speech start to TTS (the demo wasn't optimized at all, lot of room for improvement). For context, Google speech translate demo 4 samples and all took exact 2 secs as I inspect manually https://youtu.be/hyXqcsWOONo?feature=shared) Image
  • Traning experiemnts for translation: better data, hyperparameters?

How good is the commercial baseline for ASR?

Know your enemy - we want to beat the commercial ones, not just opensource

  • Test Zalo Kiki (the ASR part).
  • Test other popular options
    => Their reponse time, their output quality?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions