Due diligence
Topic
Other / All
Question
Hello, thank you for open-sourcing the Moshi fine-tuning code. I've been listening to several audio clips from the DailyTalkContijuous dataset. I noticed that they all seem to be multi-turn conversations from two characters.
I'm wondering:
- Is this build necessary? Can each audio clip contain only a single turn of dialogue, such as User: XXXX; Moshi: XXXX?
-
- Can Moshi learn multiple voices? I've seen improvements in previous cases, with Moshi able to simulate pirates and other characters.
Due diligence
Topic
Other / All
Question
Hello, thank you for open-sourcing the Moshi fine-tuning code. I've been listening to several audio clips from the DailyTalkContijuous dataset. I noticed that they all seem to be multi-turn conversations from two characters.
I'm wondering: