Question about creating train set for lora-finetuing Moshi

### Due diligence

- [x] I have done my due diligence in trying to find the answer myself.

### Topic

Other / All

### Question

Hello, thank you for open-sourcing the Moshi fine-tuning code. I've been listening to several audio clips from the DailyTalkContijuous dataset. I noticed that they all seem to be multi-turn conversations from two characters. 
I'm wondering: 
1. Is this build necessary? Can each audio clip contain only a single turn of dialogue, such as User: XXXX; Moshi: XXXX? 
2. 2. Can Moshi learn multiple voices? I've seen improvements in previous cases, with Moshi able to simulate pirates and other characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about creating train set for lora-finetuing Moshi #14

Due diligence

Topic

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question about creating train set for lora-finetuing Moshi #14

Description

Due diligence

Topic

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions