[Not for merge] Add Emilia Training Recipe for Llasa (cosyvoice2 token) #1887

yuekaizhang · 2025-03-03T06:14:20Z

Inspired by Llasa, this PR enables continued pretraining of the Qwen2 LLM for the CosyVoice2 semantic token prediction task.

The predicted semantic tokens can be used to generate audio with either the CosyVoice2 pretrained U-Net model or the DIT model PR.

LLM Model	Flow matching Model	Seed-TTS test_zh CER	Comment
pretrained cosyvoice2 0.5B	f5-tts-small (wenetspeech4tts 7k hours)	1.79% (16 steps)	See PR
llasa_cosyvoice2_token 0.5B (Emilia_ZH 50k hours)	f5-tts-small (wenetspeech4tts 7k hours)	1.81% (16 steps)

danpovey · 2025-03-03T15:03:10Z

egs/emilia/TTS/README.md

+See https://arxiv.org/pdf/2407.05361.
+
+> [!CAUTION]
+> The next-gen Kaldi framework provides tools and models for generating high-quality, synthetic speech (Text-to-Speech, TTS).


i think the terms & conditions may have been taken from another framework & the name changed?
may be safest to just delete this . (Assuming we decide it makes sense to merge the PR overall, which we can discuss separately.)

Yeah, I copied from libritts recipe here https://github.com/k2-fsa/icefall/tree/master/egs/libritts/TTS#readme. Deleted now.

danpovey · 2025-03-03T15:07:57Z

This seems like good work, and it's nice that you want to include it in our collection of recipes.
I also find it quite interesting. But I'm trying to come up with a good justification why it should be included here,
other than the fact that we are also interested in the TTS task right now. I.e. are we OK with icefall being a collection of
recipes even in cases where they have very little in common?
(BTW I notice that the instructions direct the user installs the k2 package, but I doubt this is actually needed).
Regardless of whether we merge it (and I'm open to input from our team members and others on this issue), I'm happy to have the pull request left here as an accessible place for discussion about this recipe and so that we can easily find it.

yuekaizhang · 2025-03-04T01:19:31Z

This seems like good work, and it's nice that you want to include it in our collection of recipes. I also find it quite interesting. But I'm trying to come up with a good justification why it should be included here, other than the fact that we are also interested in the TTS task right now. I.e. are we OK with icefall being a collection of recipes even in cases where they have very little in common? (BTW I notice that the instructions direct the user installs the k2 package, but I doubt this is actually needed). Regardless of whether we merge it (and I'm open to input from our team members and others on this issue), I'm happy to have the pull request left here as an accessible place for discussion about this recipe and so that we can easily find it.

Thank you for your feedback! Indeed, the structure of this PR differs from other recipes in Icefall. Initially, I planned to implement it using Lhotse and Icefall training loops. However, I found it simpler to use the Hugging Face dataset and trainer since it's a language model token prediction task.

I have added a [Not for merge] tag so that people can still reference the results in the PR.

yuekaiz and others added 6 commits February 28, 2025 10:01

add token extraction

540430d

add training codes

fa65870

add llasa infer

0f7ebb7

add eval seed tts

d2b473a

clean code

7623939

remove run.sh

bc6e113

yuekaizhang requested a review from JinZr March 3, 2025 06:36

update results

c473192

danpovey reviewed Mar 3, 2025

View reviewed changes

yuekaizhang changed the title ~~Add Emilia Training Recipe for Llasa (cosyvoice2 token)~~ [Not for merge] Add Emilia Training Recipe for Llasa (cosyvoice2 token) Mar 4, 2025

yuekaizhang removed the request for review from JinZr March 4, 2025 01:19

update readme and requirements

1653b76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Not for merge] Add Emilia Training Recipe for Llasa (cosyvoice2 token) #1887

[Not for merge] Add Emilia Training Recipe for Llasa (cosyvoice2 token) #1887

yuekaizhang commented Mar 3, 2025 •

edited

Loading

danpovey Mar 3, 2025

yuekaizhang Mar 4, 2025

danpovey commented Mar 3, 2025

yuekaizhang commented Mar 4, 2025

[Not for merge] Add Emilia Training Recipe for Llasa (cosyvoice2 token) #1887

Are you sure you want to change the base?

[Not for merge] Add Emilia Training Recipe for Llasa (cosyvoice2 token) #1887

Conversation

yuekaizhang commented Mar 3, 2025 • edited Loading

danpovey Mar 3, 2025

Choose a reason for hiding this comment

yuekaizhang Mar 4, 2025

Choose a reason for hiding this comment

danpovey commented Mar 3, 2025

yuekaizhang commented Mar 4, 2025

yuekaizhang commented Mar 3, 2025 •

edited

Loading