Full-Duplex-Conversational-System

Here is my attempt to convert an LLM to a Full Duplex Dialogue System from scratch while being GPU poor :)

My approach is to break it into three steps of training -

Step 1: Training the LLM to understand the speech tokens. I use Kyutai's mimi for tokenizing the speech.
Step 2: Training the model on dialogues at utterance level without overlaps to help it understand the distribution of spoken dialogue better.
Step 3: To finally convert the model to a Full-Duplex system by using time warping.

Cuurently, I am in the process of training the first step.

Data Processing

First we convert the data into webdataset shards such that they can be used easily for preprocessing during training. In this step we convert the audio into mimi tokens, convert the text into Qwen tokens, add instructions and other meta deta. Converting to Qwen tokens is optional as we can also do it in the preprocessin step incase we use another LLM. Hence, the main function is to convert the speech data into Mimi tokens and standardize them so that they can be preprocessed easily later.

The directory contains three sub-directories for each steps. Please refer to the Readme in each folder for more details.

Preprocessing

Run all the files from the root directory. Refer to the readme of each sub-directory for more information.

For step 1, run the following for preprocessing -

python -m training.step1.preprocess \
  --config training/step1/configs/preprocessing.yaml

To inspect whether the preprocessed data is correctly stored run this -

python -m python training.step1.inspect_packed_shard \
  --tar path/to/tar/file \
  --sample-index 0 \
  --tokenizer path/to/tokenizer \
  --mimi-ckpt path/to/mimi \
  --num-codebooks 4 \
  --speech-codebook-size 2048 \
  --device cuda \
  --out-dir path/to/output/directory

For step 2, run the following for preprocessing

For step 3, run the following for preprocessing

Training

Run all the files from the root directory. Refer to the readme of each sub-directory for more information.

For step 1, we utilize curriculum learning which can be set up using the config file. Then we run the following for training -

python -m training.step1.train \
  --config training/step1/configs/train.yaml \
  --num-nodes [num_of_nodes] \
  --num-gpus-per-node [num_of_gpus_per_node]

Evaluation

For step 1, use the eval.yaml config file to run the evaluation -

srun python -m training.step1.eval \
        --config training/step1/configs/eval.yaml \
        --num-nodes [num_of_nodes] \
        --num-gpus-per-node [num_of_gpus_per_node]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data_processing		data_processing
tokenization		tokenization
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Full-Duplex-Conversational-System

Data Processing

Preprocessing

Training

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Full-Duplex-Conversational-System

Data Processing

Preprocessing

Training

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages