Vietnamese Translation

# 🎯 Goal
- To have a realtime Vietnamese - English transaltion demo
- Current demo state: v0.2

![Image](https://github.com/user-attachments/assets/f05ef988-c41f-4f2e-a0f3-0895de36890a)

###

### Backlog
- [ ] ASR Quality and Speed (do some hacking like @new5558  did with existing checkpoint or finetune new model with better data?)
- [ ] Small bench voice for ASR for Vietnamese, Thai (low resource language), as suggested by @Yip-Jia-Qi in real life environment (coffeeshop, home, record by laptop mic, webcam mic, phone mic,...)
- [ ] VAD @thinhlpg  or streaming @new5558 (I'm following the VAD path, the problem is that VAD sometime suck and missing some speech at the start of the speech)
- [ ] Serving model for optimial latency, currently took 6s from speech start to TTS (the demo wasn't optimized at all, lot of room for improvement). For context, Google speech translate demo 4 samples and all took exact 2 secs as I inspect manually https://youtu.be/hyXqcsWOONo?feature=shared) ![Image](https://github.com/user-attachments/assets/cb6d277d-590c-4041-a7d5-b3ebd753edad)
- [ ] Traning experiemnts for translation: better data, hyperparameters?

### How good is the commercial baseline for ASR?
**Know your enemy** -  we want to beat the commercial ones, not just opensource
- [ ] Test Zalo Kiki (the ASR part). 
- [ ] Test other popular options
=> Their reponse time, their output quality?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vietnamese Translation #226

🎯 Goal

Backlog

How good is the commercial baseline for ASR?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Vietnamese Translation #226

Description

🎯 Goal

Backlog

How good is the commercial baseline for ASR?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions