This project demonstrates how to fine-tune a HuBERT model for Connectionist Temporal Classification (CTC) Automatic Speech Recognition (ASR).
This project focuses on training a self-supervised learning (SSL) model and evaluating its performance on a downstream task in Cantonese. It fine-tunes a pre-trained HuBERT model on a speech dataset for automatic speech recognition (ASR) using the Connectionist Temporal Classification (CTC) loss function. The goal is to achieve high accuracy in Cantonese speech recognition.
-
Install dependencies:
pip install -r requirements.txt
-
Prepare the dataset:
- Download and extract the desired speech dataset.
- Modify the
train.py
script to point to the correct dataset location and configuration.
To train the model, run the train.py
script:
python train.py