-
Notifications
You must be signed in to change notification settings - Fork 98
Description
This is a follow up to the previous discussion threads regarding stochastic duration predictor in #11 and #68 (comment), as well as with the reference of Bert-VITS2:
Regarding training using SDP, I have a few feedbacks:
-
A few months ago my experiments using
use_sdpat earlier steps(100K ~ 500K) show below the average results compared to those trained withoutuse_sdp, the audios did not sound natural and certain pronunciations are not clear. Now I plan to transfer learn a more well-trained checkpoint with SDP(like mentioned in the thread above), would be curious to hear anyone who has done similar experiments. -
I am curious to learn if adding
sdp_ratioand training both SDP and DP simultaneously would offer any improvements to results. Not sure about how much code changes but would love to add a pr if this sounds good to you! -
About train both SDP & DP together and compare the result to save time(necessary of adversarial duration predictor #11 (comment)), if we train from scratch using this method my assumption is it does not sound good compared to two stage training.
-
DurationPredictorworks very well from my experience, but is there any improvement can be done regarding both DP models?
==========================
A summary of my experience using use_sdp so far(will update later when I have more results):
- train using SDP from scratch: does not sound good at all.
- train without SDP from scratch: sound natural, best performing checkpoint to date
- train without SDP from scratch, then continue training using SDP: ?
- train with both SDP & DP by ratio: ?