This repository contains the official implementation of our Interspeech 2025 paper: "CBA-Whisper: Curriculum Learning-Based AdaLoRA Fine-Tuning on Whisper for Low-Resource Dysarthric Speech Recognition".
▲ Figure 1. Inference and training strategies of BCA-Whisper
This work focuses on DSR (Dysarthric Speech Recognition). We provide the training and inference scripts, along with instructions for reproducibility. For detailed methodology and experiments, please refer to our solution PDF.
Ensure you have the following dependencies installed:
pip install -r requirements.txt
To train the model, use the following command:
mv dataset ./whisper-finetune
cd whisper-finetune
python finetune.py # for Single-GPU
python torchrun --nproc_per_node=4 finetune.py # for Multi-GPU
python merge_lora.py
To run inference faster on a sample:
ct2-transformers-converter --model models/whisper-large-v2-finetune --output_dir models/whisper-large-v2-finetune-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization float16
python infer_ct2.py --input example.wav --output output.wav
Results of Submissions from the witty Team
| Method | Test 1 WER | Test 1 SemScore | Submitted at |
|---|---|---|---|
| step0 finetune | 10.02 | 87.38 | 2025-01-13 |
| +deduplication | 9.92 | 87.99 | 2025-01-19 |
| +increase volume | 8.75 | 88.81 | 2025-01-24 |
| >norm volume | 8.69 | 88.93 | 2025-01-25 |
| all norm volume | 8.48 | 89.38 | 2025-01-27 |
| step1 finetune | 8.13 | 89.88 | 2025-01-31 |
| +error correlation | 8.09 | 90.17 | 2025-02-01 |
| step2 finetune | 7.98 | 90.37 | 2025-02-02 |
We gratefully acknowledge the following resources and contributions that made this project possible:
- OpenAI Whisper and WhisperX: Base speech recognition model architecture.
- Whisper Finetune: Adaptive Low-Rank Adaptation method for parameter-efficient fine-tuning.
- The SAPC Team: For hosting the competition/evaluation platform and providing carefully annotated dysarthric speech dataset.
- Anonymous Reviewers: For their constructive feedback.