Mispronunciation detection App

About

This repo contains the code for the Sound and music computing (CS5647) final project.

This project is essentially a Mispronunciation detection system which can help L2 learners of English understand the pronunciation problems

It has three components to it:

Frontend (React code)

Response & Intuitive user interface which gamifies the process of learning new words and sentences
Has a "virtual assistant" (built using Azure TTS) which can show the lip movement animation for a given sentence thereby enabling users to easily understand the pronunciation

Backend (Fast api)

Exposes api's needed for user login and progress state
Connects to a backend DB which contains user and word-phoneme information needed for constructing sentences
ML Inference API which accepts audio from UI as input and returns the predicted ARPABET phoneme sequences and provides a score

ML Model (Wave2Vec2 model trained on L2 Arctic using Speechbrain dataset)

Model trained on L2 Arctic dataset for a few specific accents ( refer to ./src/ml/util/prepare_data.ipynb for more details) on Google Colab
The checkpoints are present in this drive link. Download the entire "results" folder and place it in the root level (/src/results)
The checkpoints need to placed there in order for the server to boot up

Frontend Setup

Refer to this github repository and refer to the readme file there for running the UI locally.

ML Setup

Download the checkpoint from the drive link mentinoed earlier and place the "results" folder under "src" folder
For retraining each of the model again here are the steps needed:

Download the L2 arctic dataset and upload to your personal google drive in a specific location inside the folder "dataset"
Run the /src/ml/util/prepare_data.ipynb as a colab notebook in order to read the dataset and create train, test, val splits along with some phoneme preprocessing.
Once this script runs successfully, a new "data" folder will be created with the final train, test & val splits needed for training the model
Refer to the individual ipynb files which can be found under "./src/ml/train/X/X_train.ipynb" for training specific models in colab (out of Wav2Vec2, Hubert & Whisper we found Wav2Vec2 to be the best in terms of fast inference, performance and easy integration with the app)
In each of these ipynb files ensure that the appropriate train yaml file from "./src/ml/config/X/train.yaml" is uploaded in the correct drive path as mentioned in the ipynb when creating the dataloaders.
Once the models are trained the checkpoints will be populated in a folder called "results" which can then be downloaded locally and placed in your server (in the location as mentioned earlier) to ensure that the backend inference api can work seamlessly
For the purposes of our app we have used only the "wav2vec2-base_ctc" checkpoints

Backend Setup

Run the following commands for installing the required packages:

cd src

pip install -r .\requirements.txt

Use the DB connection string and replace the string 'POSTGRES_DB_STRING_TO_BE_REPLACED' found inside /src/infra/db
cd into the src directory run this command:

python -m uvicorn main:app --reload

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
src		src
.gitignore.txt		.gitignore.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mispronunciation detection App

About

Frontend Setup

ML Setup

Backend Setup

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

AnjuGopalkrishnan/pronunciation-evaluation-smc

Folders and files

Latest commit

History

Repository files navigation

Mispronunciation detection App

About

Frontend Setup

ML Setup

Backend Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages