I first trained an FS-EEND model using train_dia.py on my Libri dataset with 2 speakers, where each speaker's utterance was generated by sampling utterances from the speaker and inserting random silences in between (this will result in a high overlap ratio). However, when I then finetune this model on my Libri dataset with 4 speakers, which is generated in the same way as the 2 speaker dataset, I noticed that the val/speech_falarm and val/speaker_falarm metrics show an rising and then fluctuating trend. The curves can be found below. The grey curve represents model trained on 2 speakers, and the purple curve represents model trained on 4 speakers:
Please ignore the logging frequency since it was changed when training the two models.
Looking forward to your response!
I first trained an FS-EEND model using
train_dia.pyon my Libri dataset with 2 speakers, where each speaker's utterance was generated by sampling utterances from the speaker and inserting random silences in between (this will result in a high overlap ratio). However, when I then finetune this model on my Libri dataset with 4 speakers, which is generated in the same way as the 2 speaker dataset, I noticed that theval/speech_falarmandval/speaker_falarmmetrics show an rising and then fluctuating trend. The curves can be found below. The grey curve represents model trained on 2 speakers, and the purple curve represents model trained on 4 speakers:Please ignore the logging frequency since it was changed when training the two models.
Looking forward to your response!