Skip to content

Conversation

@South-Twilight
Copy link

@South-Twilight South-Twilight commented Feb 2, 2024

Here is the PR for audio resynthesis in discrete token:

  1. We extend hubert_voc1 to token_voc1 and it can handle more models token;
  2. We add f0 for training and inference when finding poor prounciation in singing;
  3. We add multi-stream method including residual cluster and weight sum;
  4. Using embedding feature of models is also allowed.

The following models have been validated in opencpop recipe: HuBERT, XLS-R, WavLM, MERT, Encodec.

1) add f0
2) use embedding feat as input (test topline)
3) add weight sum token
1) separate single layer config: hifigan_hubert_16k_nodp_f0.v1.yaml
2) add annotation to DiscreteSymbolF0Generator.infer
1) add stage 4 of run.sh -- "Scoring"
1) update training steps from 25w to 40w
1) add f0 rmse,semitone acc,uvu acc evaluation indicators
1) add continuous f0
2) add yaml for 48khz wav
…ugs:

1) add multi-stream RVQ cluster
2) add 48kHz encodec token
3) update some annotations
4) remove git some useless tracks
… token_voc1 for PR

1) refactor conf
2) add annotations
@kan-bayashi kan-bayashi self-requested a review February 5, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant