-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Checks
- This template is only for usage issues encountered.
- I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- I have searched for existing issues, including closed ones, and couldn't find a solution.
- I am using English to submit this issue to facilitate community communication.
Environment Details
Kaggle notebook: Python 3.12.2
Torch version: 2.8.0+cu126
I use GPU P100.
Steps to Reproduce
Sorry if I'm annoying. I want to understand your project so I'm trying to clone your project in Kaggle for english language only since i don't have any local computer or any server for training.
I clone most of your modules and function. I only adjust dataset module and vocabulary.
My vocabulary:
symbols = [
' ', '!', ',', '-', '.',
';', '?', 'a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q',
'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', "'", '“', '”'
]
print(len(symbols))
vocab_char_map = {}
for i, char in enumerate(symbols):
vocab_char_map[char[:]] = i
vocab_size= len(symbols)
My config: based on your small config.
mel_spec_kw = dict( target_sample_rate= 24_000,
n_mel_channels= 100,
hop_length= 256,
win_length= 1024,
n_fft= 1024,
mel_spec_type= 'vocos',
)
model_arch = dict(
dim= 768,
depth= 18,
heads= 12,
ff_mult= 2,
text_dim= 512,
text_mask_padding=False,
conv_layers= 4,
pe_attn_head= 1,
attn_backend= 'torch',
attn_mask_enabled= False,
checkpoint_activations= False
)
My dataset is Librispeech-100h with normalized text file. I also removed prepare dataset functions and logging.
You can review my notebook here: https://www.kaggle.com/code/nguyenquoctuan12/f5tts-small
I'm still learning. I really appreciate your help.
Sorry for my bad English. Have a nice day!
✔️ Expected Behavior
A good generated wav and checkpoint.
❌ Actual Behavior
I'm currently reaching 130k step. The loss is around 0.6. I think the generated wav had ref wav's accent, vocal but i cant hear any word from it. I know you said the minimum step is 200k+ but i don't have much resource for training. Is there anything i did wrong.