[feat] Refactor training framework into fastvideo/train#1156
[feat] Refactor training framework into fastvideo/train#1156alexzms wants to merge 226 commits intohao-ai-lab:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the training infrastructure by introducing a new, highly modular framework. The primary goal is to enhance extensibility and maintainability by clearly separating concerns between core training logic, specific models, and various training methods. This change moves towards a more configurable and less coupled system, allowing for easier integration of new research and models. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a massive and impressive refactoring of the training framework, moving towards a more modular, decoupled, and YAML-driven architecture inspired by FastGen. The introduction of _target_-based instantiation, a callback system, and clear separation of concerns for models, methods, and the trainer are excellent design choices that will significantly improve maintainability and extensibility. The extensive documentation and phased migration plan are also highly commendable. I've found a few critical issues in the implementation that need to be addressed, primarily related to a syntax error in a package __init__, a logic bug in model type detection, and unreachable code in an attention module.
Note: Security Review did not run due to the size of the PR.
| ) and not ( | ||
| cls_name.startswith("WanGame") | ||
| or cls_name == "WanGameActionTransformer3DModel" | ||
| or cls_name.startswith("CausalWan") | ||
| or getattr(fastvideo_args.pipeline_config, "prefix", "") == "WanGame" | ||
| or cls_name.startswith("WanLingBot") | ||
| or cls_name == "WanLingBotTransformer3DModel" | ||
| or getattr(fastvideo_args.pipeline_config, "prefix", "") == "WanLingBot" | ||
| or cls_name.startswith("CausalWanGameActionTransformer3DModel") | ||
| ) |
There was a problem hiding this comment.
The logic to determine is_wan_model appears to have a contradiction. The condition cls_name.startswith("CausalWan") is included in the initial positive check (line 846) and also in a negative exclusion block (line 849). This means the condition for CausalWan models will always evaluate to false, preventing them from being correctly identified as a wan_model.
|
Related Issue: #1158 |
Summary
Introduces
fastvideo/train, a refactored training framework that replaces the monolithic training/distillation pipelines with a modular, YAML-driven architecture.Key design changes
_target_-based instantiation: Models and methods are selected via_target_keys in YAML (e.g.,fastvideo.train.models.wan.WanModel,fastvideo.train.methods.distribution_matching.dmd2.DMD2Method), making it easy to add new models/methods without modifying framework code.models/), methods (methods/), callbacks (callbacks/), and the training loop (trainer.py) are fully decoupled. The trainer callsmethod.train_one_step()without knowing which method is running.callbacks/) rather than hardcoded in the training loop. Configured via thecallbacks:section in YAML.
TrainingConfigdataclass (utils/training_config.py) provides typed defaults for all training parameters. The fully-resolved config(with defaults filled in) is logged to W&B.
CheckpointManager, plusdcp_to_diffusers.pyfor converting checkpoints to Diffusers format.Supported models & methods
Bug fixes
real_score_guidance_scalein DMD2 and self-forcing to use the standard formulauncond + scale * (cond - uncond)instead ofcond + scale * (cond - uncond)(which silently added +1 to the effective guidance scale).File structure
fastvideo/train/
trainer.py
models/{base, wan/, wangame/}
methods/{base, distribution_matching/, fine_tuning/}
callbacks/{callback, grad_clip, validation, ema}
entrypoint/{train, dcp_to_diffusers}
utils/{config, builder, training_config, checkpoint, dataloader, optimizer, tracking, ...}
Usage
torchrun --nproc_per_node=8 -m fastvideo.train.entrypoint.train \ --config examples/distillation/refactor/distill_wan2.1_t2v_1.3B_dmd2.yaml Test plan - DMD2 8-step distillation on Wan 2.1 T2V 1.3B matches legacy training loss curves - VSA finetuning on Wan produces equivalent results to legacy pipeline - Self-forcing distillation on WanGame runs without errors - DFSFT on WanGame runs without errors - Checkpoint save/resume round-trips correctly - W&B logging shows fully-resolved config with defaults