Accepted to NAACL 2025
- Table Of Contents
- About The Repository
- Getting Started
- Data prepration
- Running experiments
- Inference
- Authors
- Citation
- License
This repository hosts the code pertaining to our paper AMPS: ASR with Multimodal Paraphrase Supervision accepted to NAACL 2025.
The main contribution of our paper is 🔎 a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR.
git clone https://github.com/csalt-research/amps-asr.git
cd amps-asrfairseq2 Dependencies
cd fairseq2
pip install --editable .seamless dependencies:
cd seamless
pip install --editable .If you encounter any issues while installing dependencies, refer to the Installation Guide.
You are all set! 🎉
Seamless needs dataset in a json format. The dataset should be in the following structure:
{"source": {"id": "<ID>", "text": "<T2T-pipeline-input-text>", "lang": "<T2T-pipeline-input-language>", "audio_local_path": "<path-to-audio-file>", "sample_rate": <audio-sample-rate>, "waveform": null, "units": null}, "target": {"id": "<ID>", "text": "<ASR-pipeline-target-text>", "lang": "<ASR+T2T-pipeline-target-language>", "audio_local_path": null, "sample_rate": null, "waveform": null, "units": null, "paraphrase": "<T2T-pipeline-target-paraphrase>"}}
{...}
{...}
{...}
{...}
.
.
.We have provided a sample dataset in the sample_data folder
Our codebase has a simple, easily customizable script run.sh, simply execute:
./run.sh s2t_loss_ratio t2t_loss_ratio loss_thresholdNote: The threshold is a tunable parameter that can help improve performance. By default, it is set to -1, meaning no thresholding is applied.
For example to run only ASR finetuning without any thresholding, you can execute:
./run.sh 1 0 -1To run AMPS with 3.2 threshold, you can execute:
./run.sh 1 1 3.2After fine-tuning, the model will be saved in the directory $EXPERIMENT_DIR.
We need to create a new .yaml card (let's say custom_model.yaml) for the newly fine-tuned model in:
- Copy the content of
BASE_MODEL.yamltocustom_model.yaml. - Update the following fields:
- Model name: Change it to
custom_model. - Checkpoint path: Set it to
$EXPERIMENT_DIR/$EXPERIMENT_NAME.pt.
- Model name: Change it to
Specify the new model in the model_name field when using the translator:
# Initialize a Translator object with a new model.
translator = Translator("custom_model", "vocoder_36langs", torch.device("cuda:0"), torch.float16)
# Predict
text_output, _ = translator.predict(
input=<path_to_input_audio>,
task_str="ASR",
tgt_lang=<tgt_lang>,
text_generation_opts=text_generation_opts,
unit_generation_opts=None
)For more details on inference, visit here
- Abhishek Gupta - MTech, CSE, IIT Bombay - Abhishek Gupta
- Amruta Parulekar - DD, EE, IIT Bombay - Amruta Parulekar
- Sameep Chattopadhyay - DD, EE, IIT Bombay - Sameep Chattopadhyay
- Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi
If you use this code for your research, please consider citing our work.
Distributed under the MIT License. See LICENSE for more information.