Empirical Methods in Natural Language Processing(EMNLP) 2023
- About The Repository
- Getting Started
- Roadmap
- Dataset Statistics
- Contributing
- Authors
- Citation
- License
This repository hosts the artefacts pertaining to our paper Accented Speech Recognition With Accent-specific Codebooks accepted to the main conference of EMNLP 2023.
The main contributions of our paper are as follows:
🔎 A new accent adaptation technique that uses a set of learnable codebooks and a new beam-search decoding algorithm to achieve significant performance improvement on both seen and unseen accents.
✅ Reproducible splits on Commonvoice dataset for accented ASR setup to facilitate fair comparisons across existing and new accent adaptation techniques.
The repository contains two folders:
- data 📁 - Contains the train, dev and test splits used for all our experiments. Additionally, the folder also contians scripts used to generate those splits. More details can be found here.
- espnet_code 📁 - Contains code to run our experiments on ESPnet toolkit. Detailed instruction on how to run our experiments can be found here.
- ESPnet installation: Follow the instructions here.
- Clone the repository containing our code and dataset.
git clone https://github.com/csalt-research/accented-codebooks-asr.git- Additionally, to run the dataset creation script, run the following:
pip install -r accented-codebooks-asr/data/requirements.txt- Extract the csvs from the
tarfile in data folder
tar -xvzf accented-codebooks-asr/data/dataset.tar.gz - Copy the files from espnet_code into ESPnet egs
cp -r accented-codebooks-asr/espnet_code/* <espnet_root_folder>/egs/commonvoice/asr1- Enter the path to the the directory hosting our splits in
run.sh
csvdir= # Path to the directory hosting all our csvs.- Run the script
./run.shThe statistics of train, dev and test splits used in our experiments are as follows:
| Accent | Train 100h (in hours) | Train (in hours) | Dev (in hours) | Test (in hours) |
|---|---|---|---|---|
| Australia | 6.95 | 45.36 | 4.33 | 0.46 |
| Canada | 6.79 | 41.13 | 1.16 | 1.21 |
| England | 19.51 | 119.9 | 3.22 | 1.65 |
| Scotland | 2.69 | 16.21 | 0.23 | 0.16 |
| US | 64.12 | 400.1 | 8.32 | 4.87 |
| Africa | - | - | - | 1.71 |
| Hongkong | - | - | - | 0.52 |
| India | - | - | - | 0.58 |
| Ireland | - | - | - | 1.94 |
| Malaysia | - | - | - | 0.39 |
| Newzealand | - | - | - | 2.11 |
| Philippines | - | - | - | 0.90 |
| Singapore | - | - | - | 0.64 |
| Wales | - | - | - | 0.27 |
See the open issues for a list of proposed features (and known issues) relevant to this work. For ESPnet related features/issues, checkout their github repository.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
- Please open an individual PR for each suggestion.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/NewFeature) - Commit your Changes (
git commit -m 'Add appropriate commit message'). The correct way to write your commit message can be found here - Push to the Branch (
git push origin feature/NewFeature) - Open a Pull Request
- Darshan Prabhu - M.Tech, CSE, IIT Bombay - Darshan Prabhu
- Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi
- Sriram Ganapathy - Associate Professor, EE, IISc Bangalore - Sriram Ganapathy
- Vinit Unni - Ph.D, CSE, IIT Bombay - Vinit Unni
If you use this code for your research, please consider citing our work.
@misc{prabhu2023accented,
title={Accented Speech Recognition With Accent-specific Codebooks},
author={Darshan Prabhu and Preethi Jyothi and Sriram Ganapathy and Vinit Unni},
year={2023},
eprint={2310.15970},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Distributed under the MIT License. See LICENSE for more information.