What's Changed
- [VSA] [STA] Fix directory structure for pypi publishing by @SolitaryThinker in #769
- [bugfix] fix STA install setup.py import by @SolitaryThinker in #770
- [bugfix] [VSA] [STA] Fix MANIFEST.in for VSA and STA; Move tk into both directories by @SolitaryThinker in #771
- [misc] [VSA] [STA] fix tk_root in setup.py for VSA and STA by @SolitaryThinker in #772
- [Preprocess][Feat] support torchvision to load video in new preprocessing by @Eigensystem in #761
- [Preprocess][Fix] video quality issue by @Eigensystem in #773
- [misc] Improve text encoding stage by @SolitaryThinker in #774
- [CI] Add ssim test for causal inference by @SolitaryThinker in #784
- [misc] Update Slack invite link by @SolitaryThinker in #786
- fix: lora_B init zeros by @DataAIPlayer in #781
- [Feature] Support Lora for DMD by @Edenzzzz in #755
- [Backend][Vmoba] Add implementation of VMoba by @EricLina in #778
- [Self-forcing] [1/n] Handle extra dim in time embedding and add timestep warping by @SolitaryThinker in #792
- [preprocessing] [self-forcing] [2/n] Improve preprocessing and add ode trajectory dataset schema by @SolitaryThinker in #794
- [bugfix] Fix delta calculation by @DataAIPlayer in #796
- [bugfix] pin gradio version and set current_vsa_sparsity in TrainingPipeline by @SolitaryThinker in #798
- [self-forcing] [3/n] Text embed only preprocessing by @SolitaryThinker in #797
- [bugfix] Fix empty PipelineConfigs for Wan2.2 A14B by @SolitaryThinker in #800
- [Bugfix] Fix VMoba requirements by @Edenzzzz in #802
- [bugfix] Wan2.2 Boundary ratio by @SolitaryThinker in #804
- [self-forcing] [4/n] Preprocessing for collecting ODE trajectory by @SolitaryThinker in #788
- Update example files and readme by @BrianChen1129 in #809
- [self-forcing] [5/n] Add Self-Forcing distillation pipeline by @JerryZhou54 in #808
- [bugfix] Update learning rates for sparse distillation recipe by @SolitaryThinker in #812
- [self-forcing] [6/n] Add Ode Init training by @SolitaryThinker in #811
- Add Sage Attention 3 Backend by @RandNMR73 in #815
- [Feature]Update count trainable param for FSDP2 by @BrianChen1129 in #820
- [bugfix] Use training_state_checkpointing_steps instead of checkpointing_steps by @SolitaryThinker in #821
- [self-forcing][8/n] Self-Forcing For Wan2.2-A14B + torch.compile training and distillation support by @RandNMR73 in #818
- [bugfix] Allow overriding dit checkpoint for inference and Lower VSA LR in example scripts by @SolitaryThinker in #831
- [feature] Add torch profiler by @SolitaryThinker in #827
- Add wan2.1 functionality support for Ascend NPU platform by @zyang6 in #810
- [Feature]Add video-to-video (V2V) pipeline by @Gary-ChenJL in #829
- [feat] unified trainer logging by @Ohm-Rishabh in #841
- [Feat] add ray support by @Eigensystem in #838
- [bugfix] [misc] Use training_state_checkpointing_steps in scripts/ by @SolitaryThinker in #846
- [bugfix] always force spawn instead of fork by @Eigensystem in #852
- [feat] Add gradio local inference demo by @SolitaryThinker in #847
- [ci] fix causal ssim test by @SolitaryThinker in #848
- move STA_configuration.py to fastvideo/attention/backends by @H1yori233 in #856
- [Feature] Add Cosmos2 i2v pipeline by @kevin314 in #837
- [bugfix] Add Cosmos2 sampling params to registry by @kevin314 in #862
- [docs] port to mkdocs by @MihirJagtap in #855
- Improve FSDP loading with size-based filtering by @Ohm-Rishabh in #853
- [Docs] add diagrams to docs by @H1yori233 in #863
- [feat] prepare for wan2.2 SF by @SolitaryThinker in #861
- [docs] Update Home Readme.md with fixed links by @MihirJagtap in #873
- [misc] update wechat and slack invite links by @SolitaryThinker in #875
- fix: incorrect dv in vsa Triton kernel causing test_vsa error by @Y-aang in #879
- [docs] add favicon by @MihirJagtap in #878
- Fix mp worker busy loop to handle all string RPC methods by @shaoxiongduan in #881
- [Bugfix] [DMD Distillation] Each rank should have its own timestep sampled by @JerryZhou54 in #885
- [bugfix] [lora] [CI] Fix LoRA alpha scaling factor & Fix LoRA Inference CI by @shaoxiongduan in #870
- [feat] Add inference for MoE SF by @JerryZhou54 in #880
- [readme] update link to inference code by @SolitaryThinker in #887
- [Feat] [I2V] resize all image sizes to below 480*832 by @JerryZhou54 in #890
- [misc] Update wechat link by @Edenzzzz in #893
- [CI] fix VSA training CI by @SolitaryThinker in #900
- [Bugfix] Minor bugfixes by @loaydatrain in #889
- [docs] modified the .github/workflows/docs.yml file to include path filtering by @MihirJagtap in #906
- Fix the docs by @eitanturok in #905
- [feat] training mfu calculation scripts by @Ohm-Rishabh in #871
- fix: correct mp backend GPU assignment on multi-GPU systems by @kuafou in #912
- Use assert_close in tests by @Edenzzzz in #429
- [docs] add docs for ssim testing by @SolitaryThinker in #918
- [feat]: add COSMOS 2.5 DiT implementation by @KyleShao1016 in #897
- [docs] fix testing.md visibility by @SolitaryThinker in #920
- [bigfix] [distillation] Fix DMD inference pipeline noise initialization shape by @SolitaryThinker in #921
- Add LoRA extraction, verification, and comparison scripts by @ShreejithSG in #865
- [bugfix] [VSA] Fix block_size computation in backward kernel by @Chuge0335 in #925
- [feat] Add fvd implementation by @ketakitank in #923
- [misc] update wechat image by @SolitaryThinker in #931
- [bugfix] [VSA] [distillation] Various bugfixes for VSA and distillation and nightly tests by @SolitaryThinker in #932
- [bugfix] [lora] [distillation] Fix lora distillation bug by @SolitaryThinker in #933
- [misc] upgrade pytorch version to 2.9.0 by @SolitaryThinker in #928
- [CI] Fix CI tests by @SolitaryThinker in #935
- [Feature] Support for Variable Q/KV Sequence Lengths in VSA ThunderKittens kernel by @alexzms in #911
- [ci]: Use pre-built docker image & skip VSA compilation by @alexzms in #939
- [misc] add schedule configurations to pytorch profiler by @Ohm-Rishabh in #934
- [feat] Support sequence packing and shard after pachification for USP by @loaydatrain in #894
- [docs] Minor Fixes by @loaydatrain in #942
- [feat] Add Matrix-Game 2.0 by @H1yori233 in #938
- [bugfix] Added VSA Padding logic by @loaydatrain in #944
- [misc] Allow manual override of Pipeline class through override_pipeline_cls_name by @SolitaryThinker in #945
- [New Model] Hunyuan1.5 by @JerryZhou54 in #943
- [docs] small fixes by @RandNMR73 in #947
- [feat] add sliding_tile attention triton kernel and ROCM support by @ZiguanWang in #916
- [rocm] Add rocm fastvideo docker image by @SolitaryThinker in #952
- [bugfix] [dmd2] allow dmd2 simulate_student_forward to use text-only dataset by @SolitaryThinker in #951
- Add LongCat T2V (Base, Distillation and Refinement) Support to FastVideo by @alexzms in #883
- feat: consolidate attention kernels into unified fastvideo-kernel package by @ShreejithSG in #946
- [kernel] Reorg and fix fastvideo-kernel by @SolitaryThinker in #962
- [kernel] Release fastvideo-kernel v0.2.1 by @SolitaryThinker in #963
- [docs] refactor attention docs by @SolitaryThinker in #964
- [kernel] Fix docker release build for kernel by @SolitaryThinker in #965
- [ci] fix kernel tests by @SolitaryThinker in #955
- [feat] Add new feature extractors for fvd by @ketakitank in #954
- [fix]: fix sliding_tile_attn with sdpa(without flash_attn) by @ZiguanWang in #967
- [fix]: fix fastvideo-kernel Rocm build and Dockerfile for Rocm by @ZiguanWang in #968
- [fix]: fix STA trition kernel for AMD RDNA archs by @ZiguanWang in #969
- [misc] Add util script to create diffuser HF repo from custom component weights by @SolitaryThinker in #970
- [kernel] add turbodiffusion kernels by @SolitaryThinker in #972
- [docs]: fix various broken links across the documentation by @kuafou in #979
- [feat] Support absmax style quantization for FP8 by @XOR-op in #981
- [New Model] Turbodiffusion by @loaydatrain in #971
- [feat] support Matrix-Game 2.0 streaming generation by @H1yori233 in #957
- [feat] Support text encoder weight override and quantization by @XOR-op in #983
- Layer offloading by @Ohm-Rishabh in #966
- [docs] Update docs and README by @SolitaryThinker in #975
- [ci] increase ssim and lora inference test timeout by @SolitaryThinker in #985
- [chore] release fastvideo-kernel 0.2.2 by @SolitaryThinker in #986
- [chore] update wechat QR code by @SolitaryThinker in #988
- Add LongCat-Video I2V and Video Continuation (Base, Distillation and Refinement) Support to FastVideo by @shaoxiongduan in #953
- [misc] pin fastvideo-kernel in .toml file by @SolitaryThinker in #989
- [feat] add Turbodiffusion I2V pipeline by @loaydatrain in #984
- [misc] add pin_cpu_memory false for RTX 4090 by @SolitaryThinker in #990
- [chore] release 0.1.7 (real) by @SolitaryThinker in #980
New Contributors
- @DataAIPlayer made their first contribution in #781
- @EricLina made their first contribution in #778
- @zyang6 made their first contribution in #810
- @Ohm-Rishabh made their first contribution in #841
- @H1yori233 made their first contribution in #856
- @MihirJagtap made their first contribution in #855
- @Y-aang made their first contribution in #879
- @shaoxiongduan made their first contribution in #881
- @loaydatrain made their first contribution in #889
- @eitanturok made their first contribution in #905
- @kuafou made their first contribution in #912
- @KyleShao1016 made their first contribution in #897
- @ShreejithSG made their first contribution in #865
- @Chuge0335 made their first contribution in #925
- @ketakitank made their first contribution in #923
- @alexzms made their first contribution in #911
- @ZiguanWang made their first contribution in #916
- @XOR-op made their first contribution in #981
Full Changelog: v0.1.6...v0.1.7