Skip to content

How to design training data #12

Description

@pabloriera

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

Paper / Data design

Question

Hi, I have been experimenting finetuning moshi with different data and have some questions.

I am using the same training config provided but using custom data.

I am having two problems:

  1. Whenever I finetune the model (including the daily-talk data example) the time-to-respond of the system degrades. This means, it takes so much time to respond comparing to the behavior observed in default moshi. I have tried biasing the pad_mul and after that the time-to-respond improves, but the quality of the generation is a little degraded, and I think it will be much better to avoid touching that parameter. I tried to include overlaps between speech turns, it seem to improve a little, but it didn't fix it.
    Do you have any tips on how to design the data to fine tune and prevent that behavior?

  2. The second problem is with the begging of the conversations. After fine tuning, I loss the Moshi's "Hello, whats goin on?" or other hellos. This happens even my data has a similar hello message at the beggining for every data example. What I observe is the systems start speaking from anyplace in a conversation similir to the topics on the data.
    Is there any other thing to try besides training with dialogues that alwas start with the same "hello" message? My first thoughrs were training with random pieces of conversation first, and the use the final batches to put a lot of the welcome part.

Thanks, awesome project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions