GitHub - Satyam52/amps-asr

AMPS: ASR with Multimodal Paraphrase Supervision

Accepted to NAACL 2025

About The Repository

This repository hosts the code pertaining to our paper AMPS: ASR with Multimodal Paraphrase Supervision accepted to NAACL 2025.

The main contribution of our paper is 🔎 a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR.

Getting Started

1. Clone the Repository

git clone https://github.com/csalt-research/amps-asr.git
cd amps-asr

2. Install the dependencies:

fairseq2 Dependencies

cd fairseq2
pip install --editable .

seamless dependencies:

cd seamless
pip install --editable .

3. Troubleshooting

If you encounter any issues while installing dependencies, refer to the Installation Guide.

You are all set! 🎉

Data prepration

Seamless needs dataset in a json format. The dataset should be in the following structure:

{"source": {"id": "<ID>", "text": "<T2T-pipeline-input-text>", "lang": "<T2T-pipeline-input-language>", "audio_local_path": "<path-to-audio-file>", "sample_rate": <audio-sample-rate>, "waveform": null, "units": null}, "target": {"id": "<ID>", "text": "<ASR-pipeline-target-text>", "lang": "<ASR+T2T-pipeline-target-language>", "audio_local_path": null, "sample_rate": null, "waveform": null, "units": null, "paraphrase": "<T2T-pipeline-target-paraphrase>"}}
{...}
{...}
{...}
{...}
.
.
.

We have provided a sample dataset in the sample_data folder

Running experiments

Our codebase has a simple, easily customizable script run.sh, simply execute:

./run.sh s2t_loss_ratio t2t_loss_ratio loss_threshold

Note: The threshold is a tunable parameter that can help improve performance. By default, it is set to -1, meaning no thresholding is applied.

For example to run only ASR finetuning without any thresholding, you can execute:

./run.sh 1 0 -1

To run AMPS with 3.2 threshold, you can execute:

./run.sh 1 1 3.2

Inference

After fine-tuning, the model will be saved in the directory $EXPERIMENT_DIR.
We need to create a new .yaml card (let's say custom_model.yaml) for the newly fine-tuned model in:

Steps to create `custom_model.yaml`

Copy the content of BASE_MODEL.yaml to custom_model.yaml.
Update the following fields:
- Model name: Change it to custom_model.
- Checkpoint path: Set it to $EXPERIMENT_DIR/$EXPERIMENT_NAME.pt.

Using the new model for inference:

Specify the new model in the model_name field when using the translator:

# Initialize a Translator object with a new model.
translator = Translator("custom_model", "vocoder_36langs", torch.device("cuda:0"), torch.float16)

# Predict
text_output, _ = translator.predict(
    input=<path_to_input_audio>,
    task_str="ASR",
    tgt_lang=<tgt_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=None
)

For more details on inference, visit here

Authors

Abhishek Gupta - MTech, CSE, IIT Bombay - Abhishek Gupta
Amruta Parulekar - DD, EE, IIT Bombay - Amruta Parulekar
Sameep Chattopadhyay - DD, EE, IIT Bombay - Sameep Chattopadhyay
Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi

Citation

If you use this code for your research, please consider citing our work.

License

Distributed under the MIT License. See LICENSE for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMPS: ASR with Multimodal Paraphrase Supervision

Table Of Contents

About The Repository

Getting Started

1. Clone the Repository

2. Install the dependencies:

3. Troubleshooting

Data prepration

Running experiments

Inference

Steps to create `custom_model.yaml`

Using the new model for inference:

Authors

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
fairseq2		fairseq2
sample_data		sample_data
seamless		seamless
README.md		README.md
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

AMPS: ASR with Multimodal Paraphrase Supervision

Table Of Contents

About The Repository

Getting Started

1. Clone the Repository

2. Install the dependencies:

3. Troubleshooting

Data prepration

Running experiments

Inference

Steps to create custom_model.yaml

Using the new model for inference:

Authors

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Steps to create `custom_model.yaml`

Packages