[feat] Refactor training framework into fastvideo/train by alexzms · Pull Request #1159 · hao-ai-lab/FastVideo

alexzms · 2026-03-08T00:41:58Z

Summary

Introduces fastvideo/train, a refactored training framework that replaces the monolithic training/distillation pipelines with a modular, YAML-driven architecture.

Key design changes

_target_-based instantiation: Models and methods are selected via _target_ keys in YAML (e.g., fastvideo.train.models.wan.WanModel,
fastvideo.train.methods.distribution_matching.dmd2.DMD2Method), making it easy to add new models/methods without modifying framework code.
Separated concerns: Models (models/), methods (methods/), callbacks (callbacks/), and the training loop (trainer.py) are fully decoupled. The trainer calls
method.train_one_step() without knowing which method is running.
Callback system: Gradient clipping, validation, and EMA are now callbacks (callbacks/) rather than hardcoded in the training loop. Configured via the callbacks:
section in YAML.
Structured config with defaults: TrainingConfig dataclass (utils/training_config.py) provides typed defaults for all training parameters. The fully-resolved config
(with defaults filled in) is logged to W&B.
Checkpoint management: DCP-based save/resume with CheckpointManager, plus dcp_to_diffusers.py for converting checkpoints to Diffusers format.

Supported models & methods

Models	Methods
Wan 2.1 (T2V 1.3B)	DMD2 distillation
WanGame (incl. causal)	Self-forcing distillation
	SFT finetuning
	DFSFT (Diffusion ForcingSFT)

Bug fixes

CFG formula: Fixed real_score_guidance_scale in DMD2 and self-forcing to use the standard formula uncond + scale * (cond - uncond) instead of cond + scale * (cond - uncond) (which silently added +1 to the effective guidance scale).

File structure

fastvideo/train/
trainer.py
models/{base, wan/, wangame/}
methods/{base, distribution_matching/, fine_tuning/}
callbacks/{callback, grad_clip, validation, ema}
entrypoint/{train, dcp_to_diffusers}
utils/{config, builder, training_config, checkpoint, dataloader, optimizer, tracking, ...}

Usage

torchrun --nproc_per_node=8 -m fastvideo.train.entrypoint.train \
    --config examples/distillation/refactor/distill_wan2.1_t2v_1.3B_dmd2.yaml

Test plan

- DMD2 8-step distillation on Wan 2.1 T2V 1.3B matches legacy training loss curves
- VSA finetuning on Wan produces equivalent results to legacy pipeline
- Self-forcing distillation on WanGame runs without errors
- DFSFT on WanGame runs without errors
- Checkpoint save/resume round-trips correctly
- W&B logging shows fully-resolved config with defaults

Thanks to @jzhang38 heavily discussing, reviewing and modifying code!

alexzms · 2026-03-08T00:42:38Z

Related Issue: #1158

gemini-code-assist · 2026-03-08T00:43:26Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural overhaul to the training framework, moving from rigid, hardcoded pipelines to a highly modular and configurable system. The primary goal is to enhance flexibility and extensibility, allowing users to easily combine different models and training algorithms through declarative YAML configurations. This refactoring streamlines the process of experimenting with new models and methods, while also standardizing infrastructure concerns like distributed training, checkpointing, and validation.

Highlights

Modular Training Framework: Introduced a refactored training framework, fastvideo/train, that replaces monolithic pipelines with a modular, YAML-driven architecture for models, methods, and infrastructure.
target-based Instantiation: Implemented _target_ keys in YAML for easy instantiation of models and methods, allowing new components to be added without modifying core framework code.
Decoupled Components: Ensured full decoupling of models, methods, callbacks, and the training loop, promoting separation of concerns and flexibility.
Callback System: Integrated a flexible callback system for functionalities like gradient clipping, validation, and EMA, configurable via YAML.
Structured Configuration: Utilized a TrainingConfig dataclass for structured, typed defaults for all training parameters, with the fully-resolved config logged to W&B.
Checkpoint Management: Implemented DCP-based save/resume functionality with CheckpointManager and a utility for converting checkpoints to Diffusers format.
Bug Fix: CFG Formula: Corrected the real_score_guidance_scale formula in DMD2 and self-forcing methods to use the standard uncond + scale * (cond - uncond).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/train/dfsft_wangame_causal_v3.yaml
- Added a new YAML configuration for causal Diffusion-Forcing SFT on WanGame.
examples/train/distill_wan2.1_t2v_1.3B_dmd2.yaml
- Added a new YAML configuration for DMD2 distillation on Wan 2.1 T2V.
examples/train/example.yaml
- Added a comprehensive example YAML configuration file for the new training framework.
examples/train/finetune_wan2.1_t2v_1.3B_vsa_phase3.4_0.9sparsity.yaml
- Added a new YAML configuration for VSA finetuning on Wan 2.1 T2V.
examples/train/finetune_wangame2.1_i2v_1.3B.yaml
- Added a new YAML configuration for finetuning WanGame I2V.
examples/train/issue.md
- Added an RFC document outlining the new training architecture for community discussion.
examples/train/rfc.md
- Added an internal RFC document detailing the file structure and example YAML for the new training framework.
examples/train/run.sh
- Added a shell script to launch training with the new YAML configurations.
examples/train/self_forcing_wangame_causal_v3.yaml
- Added a new YAML configuration for causal Self-Forcing distillation on WanGame.
fastvideo/train/.style.yapf
- Added a YAPF style configuration for consistent code formatting.
fastvideo/train/init.py
- Initialized the fastvideo.train package.
fastvideo/train/callbacks/init.py
- Initialized the fastvideo.train.callbacks package.
fastvideo/train/callbacks/callback.py
- Defined the base Callback class and CallbackDict manager.
fastvideo/train/callbacks/ema.py
- Implemented the EMACallback for exponential moving average updates.
fastvideo/train/callbacks/grad_clip.py
- Implemented the GradNormClipCallback for gradient norm clipping.
fastvideo/train/callbacks/validation.py
- Implemented a generic ValidationCallback for periodic inference validation.
fastvideo/train/entrypoint/init.py
- Initialized the fastvideo.train.entrypoint package.
fastvideo/train/entrypoint/dcp_to_diffusers.py
- Provided a script to convert DCP checkpoints to Diffusers format.
fastvideo/train/entrypoint/train.py
- Implemented the main YAML-only training entrypoint.
fastvideo/train/methods/init.py
- Initialized the fastvideo.train.methods package with lazy imports.
fastvideo/train/methods/base.py
- Defined the abstract base class for training methods.
fastvideo/train/methods/consistency_model/init.py
- Added a placeholder package for consistency model methods.
fastvideo/train/methods/distribution_matching/init.py
- Initialized the fastvideo.train.methods.distribution_matching package.
fastvideo/train/methods/distribution_matching/dmd2.py
- Implemented the DMD2 distillation training method.
fastvideo/train/methods/distribution_matching/self_forcing.py
- Implemented the Self-Forcing distillation method for causal models.
fastvideo/train/methods/fine_tuning/init.py
- Initialized the fastvideo.train.methods.fine_tuning package with lazy imports.
fastvideo/train/methods/fine_tuning/dfsft.py
- Implemented the Diffusion-forcing SFT (DFSFT) training method.
fastvideo/train/methods/fine_tuning/finetune.py
- Implemented the supervised fine-tuning (SFT) training method.
fastvideo/train/methods/knowledge_distillation/init.py
- Added a placeholder package for knowledge distillation methods.
fastvideo/train/models/init.py
- Initialized the fastvideo.train.models package.
fastvideo/train/models/base.py
- Defined the abstract base classes for per-role model instances (ModelBase, CausalModelBase).
fastvideo/train/models/wan/init.py
- Initialized the fastvideo.train.models.wan package.
fastvideo/train/models/wan/wan.py
- Implemented the WanModel plugin for T2V models.
fastvideo/train/models/wan/wan_causal.py
- Implemented the WanCausalModel plugin with streaming capabilities.
fastvideo/train/models/wangame/init.py
- Initialized the fastvideo.train.models.wangame package.
fastvideo/train/models/wangame/wangame.py
- Implemented the WanGameModel plugin for I2V models.
fastvideo/train/models/wangame/wangame_causal.py
- Implemented the WanGameCausalModel plugin with streaming capabilities.
fastvideo/train/trainer.py
- Implemented the core training loop logic.
fastvideo/train/utils/init.py
- Initialized the fastvideo.train.utils package.
fastvideo/train/utils/builder.py
- Provided functions to build training components from configuration.
fastvideo/train/utils/checkpoint.py
- Managed checkpointing, saving, resuming, and cleanup using DCP.
fastvideo/train/utils/config.py
- Defined RunConfig and provided utilities for parsing YAML configurations.
fastvideo/train/utils/dataloader.py
- Provided functions to build parquet dataloaders for T2V and WanGame.
fastvideo/train/utils/instantiate.py
- Provided utilities for _target_-based class instantiation.
fastvideo/train/utils/module_state.py
- Provided a utility to set module trainability and mode.
fastvideo/train/utils/moduleloader.py
- Provided functions to load specific model modules from paths.
fastvideo/train/utils/optimizer.py
- Provided functions to build optimizers and learning rate schedulers.
fastvideo/train/utils/tracking.py
- Provided functions to initialize and manage experiment trackers.
fastvideo/train/utils/training_config.py
- Defined dataclasses for structured training configuration.
fastvideo/train/utils/validation.py
- Provided utility functions for parsing validation-related configuration.

Activity

A new fastvideo/train directory was introduced, containing a refactored and modular training framework.
Core components for models, training methods, callbacks, and utilities were added, enabling a YAML-driven configuration approach.
New example YAML configurations were provided for various training scenarios, including DMD2 distillation, Self-Forcing, SFT, and DFSFT.
An RFC document detailing the new architecture was added for community review and discussion.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a major and well-designed refactoring of the training framework, making it modular and YAML-driven. The separation of concerns into models, methods, and infrastructure is a significant improvement. The code is generally of high quality.

My review focuses on a few areas to improve portability and maintainability:

Hardcoded Paths: Several example configuration files and a shell script contain user-specific absolute paths, which should be replaced with placeholders or relative paths to make them portable.
Code Encapsulation: One of the entrypoint scripts imports private functions from another module, which could be refactored to improve encapsulation and reduce code duplication.
Documentation Formatting: There are minor markdown formatting issues in one of the documentation files.

_{Note: Security Review did not run due to the size of the PR.}

examples/train/dfsft_wangame_causal_v3.yaml

examples/train/finetune_wangame2.1_i2v_1.3B.yaml

examples/train/run.sh

examples/train/self_forcing_wangame_causal_v3.yaml

gemini-code-assist · 2026-03-08T00:45:53Z

fastvideo/train/entrypoint/dcp_to_diffusers.py

+def _run_config_from_raw(
+    raw: dict[str, Any],
+) -> Any:
+    """Reconstruct a RunConfig from a raw config dict.
+
+    This mirrors ``load_run_config`` but operates on an
+    already-parsed dict (from metadata.json) instead of
+    reading from a YAML file.
+    """
+    from fastvideo.train.utils.config import (
+        RunConfig,
+        _build_training_config,
+        _parse_pipeline_config,
+        _require_mapping,
+        _require_str,
+    )
+
+    models_raw = _require_mapping(
+        raw.get("models"), where="models",
+    )
+    models: dict[str, dict[str, Any]] = {}
+    for role_key, model_cfg_raw in models_raw.items():
+        role_str = _require_str(
+            role_key, where="models.<role>",
+        )
+        model_cfg = _require_mapping(
+            model_cfg_raw,
+            where=f"models.{role_str}",
+        )
+        models[role_str] = dict(model_cfg)
+
+    method_raw = _require_mapping(
+        raw.get("method"), where="method",
+    )
+    method = dict(method_raw)
+
+    callbacks_raw = raw.get("callbacks", None)
+    callbacks: dict[str, dict[str, Any]] = (
+        _require_mapping(
+            callbacks_raw, where="callbacks",
+        )
+        if callbacks_raw is not None
+        else {}
+    )
+
+    pipeline_config = _parse_pipeline_config(
+        raw, models=models,
+    )
+
+    training_raw = _require_mapping(
+        raw.get("training"), where="training",
+    )
+    t = dict(training_raw)
+    training = _build_training_config(
+        t,
+        models=models,
+        pipeline_config=pipeline_config,
+    )
+
+    return RunConfig(
+        models=models,
+        method=method,
+        training=training,
+        callbacks=callbacks,
+        raw=raw,
+    )


The function _run_config_from_raw and its usage of private functions (e.g., _build_training_config, _parse_pipeline_config) from fastvideo.train.utils.config suggests a need for refactoring. Importing private members from other modules can lead to fragile code.

Consider one of the following approaches:

Make the helper functions in fastvideo.train.utils.config public if they are intended for reuse.

Refactor load_run_config to accept either a file path or a pre-loaded dictionary, which would eliminate the need for _run_config_from_raw and the private imports.

…rain-clean-refactor

…esearch/FastVideo into train-clean-refactor

…-clean-refactor

full training infra implementation

9da01fa

alexzms requested a review from jzhang38 March 8, 2026 00:42

alexzms mentioned this pull request Mar 8, 2026

[RFC]: Unified, YAML-Driven Training Architecture for Video Diffusion Models #1158

Closed

1 task

gemini-code-assist bot reviewed Mar 8, 2026

View reviewed changes

alexzms and others added 15 commits March 8, 2026 00:57

remove sampler_kind scheduler kind

c20b025

scrips and yamls

36af66f

fix ema with fsdp

27fd7c1

fix wandb tracker

cb3f85c

no redundant predict x0 in model specific impl

7427510

fix validation dmd timestep missing

1dceae4

causal wan init impl

7353a4f

move dev file position

05b40f8

Merge branch 'main' of https://github.com/hao-ai-lab/FastVideo into t…

57a0ee9

…rain-clean-refactor

mv design to docs

4a56625

update ema

9abae81

fix generator and reproducibility

6ebcbdc

testing self forcing

86dcdad

training infra doc init impl

3898fbc

generic cli override of config. remove seperate resume

36bff05

alexzms mentioned this pull request Mar 8, 2026

[Bug] FSDP EMA save failed #915

Open

alexzms and others added 8 commits March 8, 2026 23:29

ode init conversion, deterministic unset

aff3174

remove deprecated rollout_mode param

0e6d8e9

update agents

cfef92b

validation remove deprecated dmd_steps

498a712

fix ckpt resuming

ad4622a

Merge branch 'train-clean-refactor' of https://github.com/FoundationR…

0a0dc8f

…esearch/FastVideo into train-clean-refactor

Merge branch 'hao-ai-lab:main' into train-clean-refactor

5409a6a

revert unused randomstate wrapper

4512ad5

jzhang38 added the go Trigger Buildkite CI label Mar 9, 2026

alexzms and others added 7 commits March 9, 2026 20:58

precommit

c32f616

slurm run script

f222068

minor

2449cd7

+ resume from checkpoint latest

17f1e38

Merge branch 'train-clean-refactor' of https://github.com/FoundationR…

b3ef4e0

…esearch/FastVideo into train-clean-refactor

precommit

680bc8e

Merge remote-tracking branch 'origin/train-clean-refactor' into train…

0674a89

…-clean-refactor

jzhang38 approved these changes Mar 9, 2026

View reviewed changes

jzhang38 merged commit bc27a03 into hao-ai-lab:main Mar 9, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Refactor training framework into fastvideo/train#1159

[feat] Refactor training framework into fastvideo/train#1159
jzhang38 merged 31 commits intohao-ai-lab:mainfrom
FoundationResearch:train-clean-refactor

alexzms commented Mar 8, 2026 •

edited

Loading

Uh oh!

alexzms commented Mar 8, 2026

Uh oh!

gemini-code-assist bot commented Mar 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexzms commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design changes

Supported models & methods

Bug fixes

File structure

Usage

Thanks to @jzhang38 heavily discussing, reviewing and modifying code!

Uh oh!

alexzms commented Mar 8, 2026

Uh oh!

gemini-code-assist bot commented Mar 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexzms commented Mar 8, 2026 •

edited

Loading