Skip to content
Closed
Show file tree
Hide file tree
Changes from 219 commits
Commits
Show all changes
226 commits
Select commit Hold shift + click to select a range
a808242
Add data preprocess pipeline for WanGame
RandNMR73 Feb 2, 2026
b1abb5c
Update action_labels
JerryZhou54 Feb 2, 2026
f3a7a37
Overfitting running for MC 10
JerryZhou54 Feb 3, 2026
bdaf2e2
Support custom action trajectories for validation
JerryZhou54 Feb 3, 2026
ca16275
no text
mignonjia Feb 6, 2026
c077249
text
mignonjia Feb 6, 2026
7aedf3d
zero init fix
mignonjia Feb 6, 2026
a70f153
Merge branch 'wangame' into wangame-text
mignonjia Feb 6, 2026
4c1d9a9
Merge pull request #2 from mignonjia/wangame-text
mignonjia Feb 6, 2026
8cb516d
Revert "Merge pull request #2 from mignonjia/wangame-text"
mignonjia Feb 6, 2026
35bd0ad
wsad
mignonjia Feb 6, 2026
868ce50
load ckpt using safetensor
mignonjia Feb 8, 2026
8b4dec5
shuffle each epoch
mignonjia Feb 8, 2026
d4a3349
actions
mignonjia Feb 9, 2026
ecc8f56
actions
mignonjia Feb 9, 2026
abc026b
compute correct trainable params
mignonjia Feb 9, 2026
d70da3b
allow multiple data path
mignonjia Feb 9, 2026
fb219f8
update inference and wangame lingtbot
H1yori233 Feb 9, 2026
2b48211
wangame ode init
H1yori233 Feb 9, 2026
b4c5420
registry causal and ode init
H1yori233 Feb 10, 2026
4441867
update script
H1yori233 Feb 10, 2026
97c4fc6
draft code
H1yori233 Feb 10, 2026
f113791
some fix
H1yori233 Feb 10, 2026
5a767aa
update
H1yori233 Feb 10, 2026
0683e09
precommit
H1yori233 Feb 10, 2026
e4b549b
Merge remote-tracking branch 'origin/main' into wangame-distillation
H1yori233 Feb 10, 2026
fbddfba
update
H1yori233 Feb 11, 2026
8f9458d
validation
mignonjia Feb 11, 2026
8f2188d
Merge remote-tracking branch 'upstream/main' into wangame
mignonjia Feb 11, 2026
2e94cea
registry; Lingbot need to validate
mignonjia Feb 11, 2026
7fc6c89
revert back seed and scheduler
mignonjia Feb 11, 2026
6d76832
some fix
H1yori233 Feb 12, 2026
3a84f74
fix causal denoising
H1yori233 Feb 12, 2026
89a48e4
validation 81 frame
mignonjia Feb 12, 2026
64ba901
update
H1yori233 Feb 12, 2026
9a38c3d
val
mignonjia Feb 13, 2026
bc1be4c
use mg causal denoising
H1yori233 Feb 14, 2026
c486e6c
fix cache handling logic
H1yori233 Feb 15, 2026
059fd2f
add visualization
H1yori233 Feb 15, 2026
ffec3b1
try to read and design
alexzms Feb 21, 2026
98a1d53
read fastgen
alexzms Feb 21, 2026
ace0cac
designing
alexzms Feb 21, 2026
b3f9faa
phase 0
alexzms Feb 21, 2026
b936be8
progressing phase 0
alexzms Feb 21, 2026
a8bada4
phase 0 should be done
alexzms Feb 21, 2026
c4c2d89
phase0 warning
alexzms Feb 21, 2026
5720ae9
validation in method
alexzms Feb 21, 2026
c8da42e
scripts for testing phase0
alexzms Feb 21, 2026
0c1bd0d
temp launch
alexzms Feb 21, 2026
97fa792
phase 1 design
alexzms Feb 21, 2026
d6ecdad
progressing phase 1
alexzms Feb 21, 2026
7b2d8e5
phase 1 init impl
alexzms Feb 21, 2026
6134721
general distill endpoint
alexzms Feb 21, 2026
4a88606
distillation
alexzms Feb 21, 2026
ce22aea
temporary run script
alexzms Feb 21, 2026
d20753b
random generator fix
alexzms Feb 21, 2026
e36507b
Phase 1 works very well on training.
alexzms Feb 22, 2026
bd24192
dmd2 adapter comments
alexzms Feb 22, 2026
b9590f8
removing phase 0 dependency
alexzms Feb 22, 2026
8461d68
design phase 2
alexzms Feb 22, 2026
c9681ce
designing phase 2: config
alexzms Feb 22, 2026
889c1c5
designing phase 2: config 2
alexzms Feb 22, 2026
7431e95
progressing phase 2
alexzms Feb 22, 2026
f8029ad
progressing phase 2. 2
alexzms Feb 22, 2026
cef57ef
phase 2 init impl
alexzms Feb 22, 2026
7f14865
phase 2 config. training code
alexzms Feb 22, 2026
99acff3
remove all legacy dependency
alexzms Feb 22, 2026
0a3be30
fix gpu num
alexzms Feb 22, 2026
225be11
ckpt manager for phase 2
alexzms Feb 22, 2026
b58d5cd
config design
alexzms Feb 22, 2026
7d20269
designing phase 3
alexzms Feb 22, 2026
02e948b
designing phase 2.9: decoupling adapter
alexzms Feb 23, 2026
c6707c8
restart thread
H1yori233 Feb 23, 2026
11709a4
designing phase 2.9: explain why families registry
alexzms Feb 23, 2026
1e1e11f
phase 2.9 init impl
alexzms Feb 23, 2026
70157b4
wan adapter decouple
alexzms Feb 23, 2026
f12732c
removing dmd in wan adapters
alexzms Feb 23, 2026
8853110
phase2.9: adapter ang families decouple from dmd
alexzms Feb 23, 2026
074f559
doc for every file
alexzms Feb 23, 2026
e8a8371
update wangame sf
H1yori233 Feb 23, 2026
92a0db7
Merge branch 'wangame-distillation' of github-second:mignonjia/FastVi…
H1yori233 Feb 23, 2026
baac257
validation decouple from dmd and role
alexzms Feb 23, 2026
88c946d
some fix
H1yori233 Feb 23, 2026
cb09285
freeze action slurm: Doom from MC
mignonjia Feb 23, 2026
743b04d
some comment to the slurm
mignonjia Feb 23, 2026
c3c75c2
fix circular import and designing phase 2.9: validation config
alexzms Feb 24, 2026
601da9d
validator still use WanDMDPipeline. Future decoupling will be done in…
alexzms Feb 24, 2026
a8e728b
phase 2.9 config
alexzms Feb 24, 2026
8ed3869
sheidewenti?
alexzms Feb 24, 2026
7220ebf
phase 3 design: decouple simulate_generator_forward
alexzms Feb 24, 2026
5ee043c
phase 3.1 impl
alexzms Feb 24, 2026
b985942
phase 3.2 impl
alexzms Feb 24, 2026
dd80a97
fix validator not using sampler.
alexzms Feb 24, 2026
167ddb9
fix timestep
alexzms Feb 24, 2026
72a91e8
deisigning phase 3.3 finetuning
alexzms Feb 24, 2026
97534d8
phase 3.3 init impl
alexzms Feb 24, 2026
2a68748
upload wandb file
alexzms Feb 24, 2026
4d2acfc
vsa finetune
alexzms Feb 25, 2026
be92a57
add wangame dmd distillation
H1yori233 Feb 25, 2026
ef3d699
discussing refactor
alexzms Feb 25, 2026
5f5e74f
changing config.md
alexzms Feb 25, 2026
a701455
better config
alexzms Feb 25, 2026
63c5985
merge yaml_config.py into utils/config.py
alexzms Feb 25, 2026
3d8b482
designing phase 3.4
alexzms Feb 25, 2026
8ba4271
phase 3.4
alexzms Feb 25, 2026
c67eb0d
family->model
alexzms Feb 25, 2026
9febcd7
family->model
alexzms Feb 25, 2026
7589720
a bug of encoder_hidden_states_img
mignonjia Feb 25, 2026
dc538dd
Merge remote-tracking branch 'origin/wangame' into wangame-distillation
H1yori233 Feb 25, 2026
638d955
rolemanager, dispatch.
alexzms Feb 25, 2026
a866c4a
log metrics
alexzms Feb 26, 2026
64b9fea
use flowshift=3 validator
alexzms Feb 26, 2026
7399670
tracker, utils, loader
alexzms Feb 26, 2026
6d25bea
utils, config.
alexzms Feb 26, 2026
7072dd4
rfc cn
alexzms Feb 26, 2026
cfa4318
rfc en
alexzms Feb 26, 2026
666ec36
eval
mignonjia Feb 26, 2026
e3cd0a3
select best ckpt
mignonjia Feb 26, 2026
52e629b
Merge branch 'hao-ai-lab:main' into distill1
alexzms Feb 26, 2026
2fb4655
add wangame diffusion forcing
H1yori233 Feb 26, 2026
ccc5e43
Merge remote-tracking branch 'origin/wangame' into wangame-distillation
H1yori233 Feb 26, 2026
dbe17e2
no npy
alexzms Feb 26, 2026
ac86bb2
Merge remote-tracking branch 'mignonjia/wangame' into distill1
alexzms Feb 26, 2026
ceb4de0
designing wangame import
alexzms Feb 26, 2026
de3b8b3
designing wangame: cfg
alexzms Feb 26, 2026
db37b64
wangame support distillation
alexzms Feb 26, 2026
d2387d3
dmd method cfg_uncond
alexzms Feb 26, 2026
e556321
wangame i2v pipeline support ode/sde
alexzms Feb 26, 2026
ed4f636
sde denoising stage
alexzms Feb 27, 2026
e994272
action cfg vs no cfg
alexzms Feb 27, 2026
f036db3
designing causal wangame and dfsft
alexzms Feb 27, 2026
91f260c
designing dfsft
alexzms Feb 28, 2026
c7334a2
validation config refine
alexzms Feb 28, 2026
a27d4a9
better validation config
alexzms Feb 28, 2026
82a24ee
dfdft and causal wan init impl
alexzms Feb 28, 2026
e2747ef
support other scheduler
alexzms Feb 28, 2026
27fd58e
not strict loading
alexzms Feb 28, 2026
7dff17c
Merge remote-tracking branch 'mignonjia/wangame-distillation' into di…
alexzms Feb 28, 2026
a225e15
use CausalWanGameActionTransformer3DModel on wangame causal
alexzms Feb 28, 2026
339a551
validator rollout mode
alexzms Feb 28, 2026
257495b
validator streaming causal rollout
alexzms Feb 28, 2026
679a65d
wangame support validator num_frames
alexzms Feb 28, 2026
8253a9a
fix scheduler out of bound
alexzms Feb 28, 2026
573699f
adapter -> models. config decleartion.
alexzms Feb 28, 2026
9144a43
32 gpu training slurm
alexzms Mar 1, 2026
7e5ed86
config
alexzms Mar 2, 2026
e85b786
designing new model class
alexzms Mar 2, 2026
e63e214
deprecate adapter design
alexzms Mar 2, 2026
7c1442c
reorder and structure inherit hierarchy
alexzms Mar 2, 2026
628765b
support init from ckpt
alexzms Mar 2, 2026
9d08b03
no repetitive model protocal
alexzms Mar 2, 2026
9050496
reduce concept: distillruntime
alexzms Mar 3, 2026
ba9686b
utils/optimizer, utils/validation
alexzms Mar 3, 2026
f4eb1e6
causal dmd config
alexzms Mar 3, 2026
07d8a57
4n8g finetuning
alexzms Mar 3, 2026
ced0ae5
wangame causal dmd 4n8g
alexzms Mar 3, 2026
c461e6b
sf init impl
alexzms Mar 3, 2026
2719650
designing causal rollout stuff
alexzms Mar 3, 2026
0b924e7
causal for self forcing
alexzms Mar 3, 2026
d63f673
self forcing only allows student to be causalmodel
alexzms Mar 3, 2026
07a673c
self forcing config
alexzms Mar 3, 2026
6b3ff77
better yaml tracker
alexzms Mar 3, 2026
2d82e59
safe checkpointing for wangame causal for self forcing
alexzms Mar 3, 2026
4e77535
common part of wangame
alexzms Mar 3, 2026
15c644a
remove manual branch in selfforcing
alexzms Mar 3, 2026
24bc5dc
fix gradient ckpt missing keys
alexzms Mar 3, 2026
acae34e
causal refactor 2
alexzms Mar 3, 2026
fcfae12
remove redundent varient
alexzms Mar 3, 2026
6d57a0a
designing refactor
alexzms Mar 4, 2026
5179f84
designing refactor 2
alexzms Mar 4, 2026
a82208a
designing refactor 3
alexzms Mar 4, 2026
bf43339
designing refactor 4
alexzms Mar 4, 2026
0e6d109
designing refactor 5
alexzms Mar 4, 2026
264ac26
deisgning refactor 6
alexzms Mar 4, 2026
3e1063f
refactor 7
alexzms Mar 4, 2026
163605c
refactor 8
alexzms Mar 4, 2026
4323130
refactor init impl
alexzms Mar 4, 2026
ac5a2f7
remove moe support for now
alexzms Mar 4, 2026
4724893
fastgen structure
alexzms Mar 4, 2026
c9de049
better yaml file structure
alexzms Mar 4, 2026
a114c14
method specific config move to right place
alexzms Mar 4, 2026
e04ced0
run scripts
alexzms Mar 4, 2026
dbca2a5
120 col
alexzms Mar 4, 2026
cb7247d
bugfix
alexzms Mar 4, 2026
9d06875
remove fastvideo.training_args dependency
alexzms Mar 4, 2026
42fa1ba
remove redundant loader_args
alexzms Mar 4, 2026
b6af8eb
distill->train
alexzms Mar 4, 2026
d72ea2b
distill->train
alexzms Mar 4, 2026
d6a101f
simplify. only nested config is allowed
alexzms Mar 4, 2026
37b001d
trainconfig should not be none during init
alexzms Mar 4, 2026
dd7fbb5
validation config will be included in traning config
alexzms Mar 4, 2026
b753c17
self.student.validator is guaranteed exist
alexzms Mar 4, 2026
eacb5f6
~/.claude/plans/wise-mixing-pie.md
alexzms Mar 4, 2026
08ed16e
simplifying code
alexzms Mar 4, 2026
7fe51fc
remove getting trainable using getattr
alexzms Mar 4, 2026
c8f6d08
timestep fix for dfsft
alexzms Mar 5, 2026
f94a725
finetune vsa and wangame yaml
alexzms Mar 5, 2026
f3f3629
validator vsa sparsity
alexzms Mar 5, 2026
0678846
validation callback
alexzms Mar 5, 2026
3b45f23
grad clipping callback
alexzms Mar 5, 2026
6c30e25
ema callback implementation
alexzms Mar 5, 2026
6a0032a
ema and corresponding validation
alexzms Mar 5, 2026
793d184
remove legacy bundle design
alexzms Mar 6, 2026
56a8ee8
better checkpointing
alexzms Mar 6, 2026
4bbb27e
entry point, dcp to diffuers conversion.
alexzms Mar 6, 2026
093e130
validation allow no video.
alexzms Mar 6, 2026
175dbb9
fix gradient ckpting and dit precision not identified problem
alexzms Mar 6, 2026
ffe7ed8
minor fixing
alexzms Mar 6, 2026
5291ffd
dispatch->builder
alexzms Mar 6, 2026
b2b7de3
grad norm must be set in callback system
alexzms Mar 6, 2026
4e3bbec
minor
alexzms Mar 6, 2026
8d83d47
remove doc
alexzms Mar 6, 2026
5fe161e
example yaml
alexzms Mar 6, 2026
a4b9691
solver -> target. remove legact valiadition key
alexzms Mar 6, 2026
7c2648a
move device from build pre to init
alexzms Mar 6, 2026
1d86df3
minor config
alexzms Mar 6, 2026
c9bfc65
fix dmd2 and selfforcing cfg
alexzms Mar 6, 2026
9b69bbe
rfc and dmd minor
alexzms Mar 6, 2026
68c08d1
remove npys
alexzms Mar 6, 2026
449c009
remove dev doc, remove other phases doc
alexzms Mar 6, 2026
850dfd8
resolve
alexzms Mar 6, 2026
9727bd9
remove redundant
alexzms Mar 7, 2026
014a1c8
issue.md
alexzms Mar 7, 2026
6e92e7d
Merge branch 'main' into distill1
alexzms Mar 7, 2026
f2d4a79
review doc
alexzms Mar 7, 2026
ea8cffb
causal wan initial
alexzms Mar 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ env
**.pyc
**.txt
*.log
*.npy
weights/
slurm_outputs/

# SSIM test outputs
fastvideo/tests/ssim/generated_videos/
Expand Down Expand Up @@ -82,4 +84,4 @@ docs/distillation/examples/
!assets/videos/**/*.mp4

dmd_t2v_output/
preprocess_output_text/
preprocess_output_text/
166 changes: 166 additions & 0 deletions dev/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Distillation YAML config (schema v2)

本文档描述当前 distillation 入口所使用的 **YAML schema v2**、字段含义与设计取舍。

相关实现:
- YAML loader:`fastvideo/distillation/utils/config.py::load_distill_run_config`
- Entrypoint:`fastvideo/training/distillation.py`
- Schema/类型定义:`fastvideo/distillation/utils/config.py`
- 示例 YAML(examples):`examples/distillation/`

## 1) 入口与约束(非常重要)

distillation 入口 **只接受一个真实存在的 YAML 文件路径**(不 merge legacy CLI/configs,
也不做路径补全/overlay)。YAML 是 single source of truth。

运行方式(示意):

```bash
python fastvideo/training/distillation.py \
--config /abs/path/to/examples/distillation/<phase>/xxx.yaml
```

CLI 仅保留少量 **runtime override**(不属于“实验定义”的内容):
- `--resume-from-checkpoint`:从 checkpoint 恢复
- `--override-output-dir`:临时覆盖输出目录(方便重复跑实验)
- `--dry-run`:只 parse + build runtime,不启动训练

## 2) YAML 顶层结构(schema v2)

```yaml
recipe: # 选择 family + method(只负责“选什么”)
roles: # role -> role spec(谁参与)
training: # infra 参数(直接映射到 TrainingArgs)
default_pipeline_config: # 默认 pipeline config(可 inline)
method_config: # method/algorithm 超参(方法侧 single source of truth)
# 或者 default_pipeline_config_path: /abs/path/to/pipeline_config.json|yaml
```

loader 规则:
- `pipeline_config` 与 `pipeline_config_path` **二选一**,不能同时提供。
- `training` 会被传入 `TrainingArgs.from_kwargs(**training_kwargs)`;我们不重造一套训练参数体系。
- 缺少 `recipe:` 会直接报错(schema v1 的 `distill:` 不再支持)。

## 3) `recipe`: 选择 family 与 method

```yaml
recipe:
family: wan
method: dmd2
```

用途:
- registry dispatch:选择 `models/<family>.py` + `methods/<method>.py` 的组合(N+M,而非 N×M)。
- 语义更通用:未来把 finetuning 也纳入时不会出现 `distill.method=finetune` 的别扭表达。

## 4) `roles`: role-based 参与者

```yaml
roles:
student:
path: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
teacher:
path: Wan-AI/Wan2.1-T2V-14B-Diffusers
trainable: false
disable_custom_init_weights: true
critic:
path: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
disable_custom_init_weights: true
```

字段含义(见 `fastvideo/distillation/utils/config.py`):
- `family`:可选;默认继承 `recipe.family`
- `path`:模型路径 / hub 名称(由 family 负责加载)
- `trainable`:该 role 的参数是否参与训练(影响 `requires_grad`/train/eval)
- `disable_custom_init_weights`:可选;禁用 family 的 “加载时自定义 init weights 逻辑”

设计原因:
- role 只是 key;framework 不强行规定 “canonical roles”。method 决定它需要哪些 roles。
- `trainable` 表示训练意图;method 仍可施加算法不变量(例如 DMD2 强制 teacher frozen)。

## 5) `training`: 直接映射到 `TrainingArgs`

`training:` 下的 key 基本上就是 `TrainingArgs` 字段(`fastvideo/fastvideo_args.py`),例如:
- 分布式:`num_gpus`, `sp_size`, `tp_size`
- 数据:`data_path`, `dataloader_num_workers`, shape/batch 相关字段
- 输出:`output_dir`, `max_train_steps`, `seed`, `checkpoints_total_limit`
- 优化器默认值:`learning_rate`, `betas`, `lr_scheduler`, ...
- tracking/validation:`log_validation`, `validation_*`, `tracker_project_name`, ...

loader 会注入/补全的 invariants(见 `fastvideo/distillation/utils/config.py`):
- `mode = ExecutionMode.DISTILLATION`
- `inference_mode = False`
- `dit_precision` 默认 `fp32`(master weights)
- `dit_cpu_offload = False`
- 分布式尺寸默认值(`num_gpus/tp_size/sp_size/hsdp_*`)
- `training.model_path` 若缺失,默认使用 `roles.student.path`(供 pipeline_config registry 使用)

关于 validation 参数的归属(当前约定):
- `training.validation`:用于描述 validation(method 也会读取这一段)
- 固定字段(框架层会用到):
- `enabled`(bool,可省略;有 section 默认启用)
- `dataset_file`(str)
- `every_steps`(int)
- 采样字段(method 按需读取并转成 `ValidationRequest`):
- `sampling_steps`(list[int] / int / str)
- `guidance_scale`(float,可选)
- `sampler_kind`(ode|sde,可选)
- `sampling_timesteps`(list[int],可选;DMD2/SDE few-step 才需要)

备注:
- `DistillTrainer` 不再读取 `training.log_validation/validation_steps/...` 做调度;
trainer 每步调用 `method.log_validation(step)`,method 决定是否执行 validation。

## 6) `default_pipeline_config` / `default_pipeline_config_path`

两种写法(二选一):

1) inline(适合少量 override):

```yaml
default_pipeline_config:
flow_shift: 8
```

1) path(适合复用大型 config 文件):

```yaml
default_pipeline_config_path: /abs/path/to/wan_1.3B_t2v_pipeline.json
```

常见字段(非穷举):
- `flow_shift`:Wan 的 flow-matching shift(影响 noise schedule)。
- `sampler_kind`:`ode|sde`,选择 sampling loop 语义(`WanPipeline` 内部切换)。

备注(重要):
- `default_pipeline_config` 是 “模型/pipeline 的默认 config”(例如 `flow_shift`、`vae_config`)。
method/validator 的采样语义不应再依赖它;采样策略应由 method 通过 `ValidationRequest`
显式传入。

## 7) `method_config`: method/algorithm 专属超参

`method_config` 由 method 自己解释。以 DMD2 为例:

```yaml
method_config:
rollout_mode: simulate # {simulate|data_latent}
generator_update_interval: 5
real_score_guidance_scale: 3.5
dmd_denoising_steps: [1000, 850, 700, 550, 350, 275, 200, 125]
```

其中:
- `rollout_mode` 替代 legacy 的 `training.simulate_generator_forward`:
- `simulate`:adapter 用零 latents 构造 batch(不依赖 `vae_latent`)
- `data_latent`:dataset batch 必须提供 `vae_latent`
- `dmd_denoising_steps` 是 method 的 few-step schedule single source of truth。

## 8) 最小可运行示例(Wan few-step DMD2)

参考 `examples/distillation/` 下的可运行 YAML:
- `examples/distillation/phase2/distill_wan2.1_t2v_1.3B_dmd2_8steps.yaml`
- `examples/distillation/phase2_9/distill_wan2.1_t2v_1.3B_dmd2_8steps_phase2.9.yaml`
- `examples/distillation/phase3_1/distill_wan2.1_t2v_1.3B_dmd2_8steps_phase3.1.yaml`
- `examples/distillation/phase3_2/distill_wan2.1_t2v_1.3B_dmd2_8steps_phase3.2.yaml`
Loading