Skip to content

Conversation

@hcsolakoglu
Copy link
Contributor

With this PR, I'm integrating the RL workflow of the F5R into the F5-TTS while maintaining the default deterministic behavior and checkpoint compliance. Goal is to enable a two‑stage pipeline (Gaussian NLL warmup + GRPO RL
fine‑tuning) with a modular reward system and opt‑in robustness improvements, without changing the default training or inference paths.

Key changes:

  • Probabilistic output head (proj_out_ln_sig) with gaussian_nll objective and backward‑compatible checkpoint loading.
  • GRPO trainer and RL sampling utilities, with optional steps_plus_one and prompt‑length modes.
  • Pluggable reward system (RewardProvider, registry, combiner) + built‑in FunASR WER and WeSpeaker similarity providers (optional deps, lazy import, caching).
  • Reward logging improvements and optional Trackio support (drop‑in for W&B).
  • Optional stability knobs for GRPO (rl.kl_eps, rl.density_eps) while keeping F5R‑parity defaults.
  • Dynamic batch sampler optimization to avoid materializing repeated batches in memory.
  • Extensive tests covering Gaussian head, checkpoint compatibility, RL training step, reward plugins, device handling, and new opt‑ins.

Notes on compatibility:

  • Defaults remain deterministic (output_dist=deterministic, objective=mse), so existing training/inference and checkpoints work unchanged.

  • All deviations from F5R behavior are opt‑in and documented in README_RL.md.

  • README_RL.md updated with a concise RL runbook, dataset prep, reward model fetch, and recommended opt‑ins.

@hcsolakoglu
Copy link
Contributor Author

I have several ideas on how to initialize the probabilistic output head, so I will be implementing and testing multiple approaches. This is still a work in progress, but I have made significant headway. If anyone would like to guide the direction, feel free to run tests and share your feedback. @SWivid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant