Skip to content

AMD-AGI/AMD_MSWA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MSWA: Refining Local Attention with Multi-Scale Window Attention

Yixing Xu, Shivank Nag, Dong Li, Lu Tian, Emad Barsoum | Paper

Advanced Micro Devices, Inc.


Dependancies

torch == 2.1.2+rocm5.5
numpy == 1.24.4
einops == 0.7.0
peft == 0.10.0
datasets == 2.19.1
deepspeed == 0.14.1
wandb == 0.16.5
transformers == 4.34.0
accelerate == 0.29.2
tokenizers == 0.14.1

Training

  1. Download Redpajama dataset.

  2. Prepare data.

    python data_prepare.py
  3. Run training script.

    sh script/diff_run.sh

Citation

@article{xu2025mswa,
  title={MSWA: Refining Local Attention with Multi-ScaleWindow Attention},
  author={Xu, Yixing and Nag, Shivank and Li, Dong and Tian, Lu and Barsoum, Emad},
  journal={arXiv preprint arXiv:2501.01039},
  year={2025}
}

About

MSWA: Refining Local Attention with Multi-ScaleWindow Attention

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors