STA inference integration is archived from main.
The full STA pipeline code (including mask search and STA inference wiring in
fastvideo/) is preserved in:
In this branch, STA kernels in fastvideo-kernel are still kept.
We do not keep STA pipeline integration in main because we believe Video
Sparse Attention (VSA) is strictly better than STA for the actively maintained
FastVideo inference path.
To run the full STA workflow, switch to the archived branch:
git fetch origin
git checkout sta_do_not_deleteThe reference script is:
examples/inference/sta_mask_search/inference_wan_sta.sh
It runs two stages:
STA_searching(full search), output atinference_results/sta/mask_search_fullSTA_tuning(sparse tuning), output atinference_results/sta/mask_search_sparse
Run:
export FASTVIDEO_ATTENTION_BACKEND=SLIDING_TILE_ATTN
export FASTVIDEO_ATTENTION_CONFIG=assets/mask_strategy_wan.json
bash examples/inference/sta_mask_search/inference_wan_sta.shWith a selected mask strategy, run inference with:
export FASTVIDEO_ATTENTION_BACKEND=SLIDING_TILE_ATTN
export FASTVIDEO_ATTENTION_CONFIG=assets/mask_strategy_wan.json
fastvideo generate \
--model-path Wan-AI/Wan2.1-T2V-14B-Diffusers \
--num-gpus 2 \
--tp-size 2 \
--sp-size 2 \
--height 768 \
--width 1280 \
--num-frames 69 \
--num-inference-steps 50 \
--prompt "A cinematic wildlife shot of a lion walking in golden grasslands." \
--output-path outputs_video/STA/Python usage on the archive branch can also set STA_mode in
VideoGenerator.from_pretrained(...):
STA_searchingSTA_tuningSTA_inference
STA kernels remain available from fastvideo-kernel. See
Attention overview for build instructions.
If you use Sliding Tile Attention in your research, please cite:
@article{zhang2025fast,
title={Fast video generation with sliding tile attention},
author={Zhang, Peiyuan and Chen, Yongqi and Su, Runlong and Ding, Hangliang and Stoica, Ion and Liu, Zhengzhong and Zhang, Hao},
journal={arXiv preprint arXiv:2502.04507},
year={2025}
}