🧱 Data Preprocess for Distillation

For distillation, we use the same data preprocessing pipeline as training. Please refer to the Training Data Preprocess for general preprocessing steps.

Distillation-Specific Datasets

FastVideo 480P Synthetic Wan Dataset

For Wan2.1 T2V distillation, we use the FastVideo 480P Synthetic Wan dataset (FastVideo/Wan-Syn_77x448x832_600k) which contains 600k synthetic latents.

# Download the preprocessed dataset
python examples/huggingface/download_hf.py \
    --repo_id "FastVideo/Wan-Syn_77x448x832_600k" \
    --local_dir "FastVideo/Wan-Syn_77x448x832_600k" \
    --repo_type "dataset"

Crush Smol Dataset

For Wan2.2 TI2V distillation, we use the crush_smol dataset which includes both raw videos and preprocessed latents.

# Download dataset
python examples/huggingface/download_hf.py \
    --repo_id=FastVideo/mini_i2v_dataset \
    --local_dir=data/mini_i2v_dataset \
    --repo_type=dataset

Preprocessing for Distillation

The preprocessing steps are identical to training. Run the appropriate preprocessing script based on your model:

# For Wan2.1 T2V
bash examples/preprocessing/v1_preprocess_wan_data_t2v

# For Wan2.2 TI2V  
bash examples/distill/Wan2.2-TI2V-5B-Diffusers/crush_smol/preprocess_wan_data_ti2v_5b.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧱 Data Preprocess for Distillation

Distillation-Specific Datasets

FastVideo 480P Synthetic Wan Dataset

Crush Smol Dataset

Preprocessing for Distillation

FilesExpand file tree

data_preprocess.md

Latest commit

History

data_preprocess.md

File metadata and controls

🧱 Data Preprocess for Distillation

Distillation-Specific Datasets

FastVideo 480P Synthetic Wan Dataset

Crush Smol Dataset

Preprocessing for Distillation