MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

Weichao Zhao, Hezhen Hu, Wengang Zhou, Yunyao Mao, Min Wang and Houqiang Li

This repository includes Python (PyTorch) implementation of this paper.

Accepted by TCSVT2024

Requirements

python==3.8.13
torch==1.8.1+cu111
torchvision==0.9.1+cu111
tensorboard==2.9.0
scikit-learn==1.1.1
tqdm==4.64.0
numpy==1.22.4

Pre-Training

Please refer to the bash scripts

Datasets

Download the original datasets, including SLR500, NMFs_CSL, WLASL and MSASL
Utilize the off-the-shelf pose estimator MMPose with the setting of Topdown Heatmap + Hrnet + Dark on Coco-Wholebody to extract the 2D keypoints for sign language videos.
The final data is formatted as follows:

    Data
    ├── NMFs_CSL
    ├── SLR500
    ├── WLASL
    └── MSASL
        ├── Video
        ├── Pose
        └── Annotations

Pretrained Model

You can download the pretrained model from this link: pretrained model on four ISLR datasets

Citation

If you find this work useful for your research, please consider citing our work:

@article{zhao2024masa,
  title={MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition},
  author={Zhao, Weichao and Hu, Hezhen and Zhou, Wengang and Mao, Yunyao and Wang, Min and Li, Houqiang},
  journal={arXiv},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

Requirements

Pre-Training

Datasets

Pretrained Model

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

Requirements

Pre-Training

Datasets

Pretrained Model

Citation