Skip to content

Training using SDP (and with DP by ratio?) #79

@isdanni

Description

@isdanni

This is a follow up to the previous discussion threads regarding stochastic duration predictor in #11 and #68 (comment), as well as with the reference of Bert-VITS2:

Regarding training using SDP, I have a few feedbacks:

  1. A few months ago my experiments using use_sdp at earlier steps(100K ~ 500K) show below the average results compared to those trained without use_sdp, the audios did not sound natural and certain pronunciations are not clear. Now I plan to transfer learn a more well-trained checkpoint with SDP(like mentioned in the thread above), would be curious to hear anyone who has done similar experiments.

  2. I am curious to learn if adding sdp_ratio and training both SDP and DP simultaneously would offer any improvements to results. Not sure about how much code changes but would love to add a pr if this sounds good to you!

  3. About train both SDP & DP together and compare the result to save time(necessary of adversarial duration predictor #11 (comment)), if we train from scratch using this method my assumption is it does not sound good compared to two stage training.

  4. DurationPredictor works very well from my experience, but is there any improvement can be done regarding both DP models?

==========================

A summary of my experience using use_sdp so far(will update later when I have more results):

  • train using SDP from scratch: does not sound good at all.
  • train without SDP from scratch: sound natural, best performing checkpoint to date
  • train without SDP from scratch, then continue training using SDP: ?
  • train with both SDP & DP by ratio: ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions