High-level structural index for agent orientation. Updated 2026-03-08.
FastVideo-WorldModel/
├── fastvideo/ # Core Python package
│ ├── models/ # Model implementations
│ │ ├── dits/ # DiT transformers (wanvideo, ltx2, ...)
│ │ ├── vaes/ # VAE models
│ │ ├── encoders/ # Text/image encoders (T5, CLIP)
│ │ ├── schedulers/ # Noise schedulers
│ │ ├── upsamplers/ # Super-resolution models
│ │ ├── audio/ # Audio models
│ │ └── loader/ # Component loaders for HF repos
│ ├── configs/ # Configuration system
│ │ ├── models/ # Arch configs + param_names_mapping
│ │ ├── pipelines/ # Pipeline wiring
│ │ └── sample/ # Default sampling parameters
│ ├── pipelines/ # End-to-end pipelines
│ │ ├── basic/ # Per-model pipelines (wan/, ltx2/, ...)
│ │ └── stages/ # Reusable pipeline stages
│ ├── train/ # Refactored training framework (YAML-driven, preferred)
│ │ ├── trainer.py # Main training loop coordinator
│ │ ├── entrypoint/ # Training entrypoint (train.py) + checkpoint conversion
│ │ ├── methods/ # Training algorithms (FineTune, DFSFT, DMD2, SelfForcing)
│ │ │ ├── base.py # TrainingMethod ABC
│ │ │ ├── fine_tuning/ # FineTuneMethod, DiffusionForcingSFTMethod
│ │ │ └── distribution_matching/ # DMD2Method, SelfForcingMethod
│ │ ├── models/ # Per-role model wrappers (ModelBase, CausalModelBase)
│ │ │ └── wan/ # WanModel, WanCausalModel
│ │ ├── callbacks/ # Composable hooks (grad_clip, ema, validation)
│ │ └── utils/ # Config, builder, checkpoint, optimizer, tracking
│ ├── training/ # Legacy training infrastructure (being phased out)
│ │ ├── trackers.py # W&B tracker (BaseTracker → WandbTracker)
│ │ ├── training_utils.py # Checkpointing, grad clipping, state dicts
│ │ ├── training_pipeline.py # Base training pipeline
│ │ ├── wan_training_pipeline.py # Wan T2V training
│ │ ├── wan_i2v_training_pipeline.py # Wan I2V training
│ │ ├── distillation_pipeline.py # Distillation base
│ │ ├── wan_distillation_pipeline.py # Wan distillation
│ │ ├── self_forcing_distillation_pipeline.py # Self-forcing distill
│ │ ├── ltx2_training_pipeline.py # LTX-2 training
│ │ └── matrixgame_training_pipeline.py # MatrixGame training
│ ├── attention/ # Attention backends
│ ├── distributed/ # Sequence/tensor parallel utilities
│ ├── layers/ # Tensor-parallel layers
│ ├── tests/ # Package-level tests
│ │ ├── training/ # Training regression tests (W&B summary comparison)
│ │ ├── ssim/ # SSIM visual regression tests
│ │ ├── encoders/ # Encoder parity tests
│ │ └── modal/ # Modal CI test runner
│ └── registry.py # Unified config registry
├── fastvideo-kernel/ # CUDA/custom kernels (separate build: ./build.sh)
├── scripts/ # Utility scripts
│ ├── distill/ # Distillation launch scripts
│ ├── inference/ # Inference scripts
│ ├── checkpoint_conversion/ # Weight conversion tools
│ ├── finetune/ # Finetune scripts
│ └── preprocess/ # Data preprocessing
├── examples/ # Ready-to-run examples
│ ├── training/ # Training examples (finetune/, consistency_finetune/)
│ ├── distill/ # Distillation examples
│ ├── inference/ # Inference examples
│ └── dataset/ # Dataset examples
├── docs/ # MkDocs documentation source
│ ├── design/overview.md # Architecture overview
│ ├── training/ # Training guides
│ └── contributing/ # Contributor guides + coding_agents.md
├── tests/ # Top-level tests (local_tests/)
├── AGENTS.md # Agent coding guidelines
└── .agents/ # Agent infrastructure (you are here)