Training Strategies for Isolated Sign Language Recognition

We introduce a robust training pipeline for Isolated Sign Language Recognition (ISLR). The approach addresses challenges specific to sign language recognition, such as limited data quality, diverse signing speeds, and the need for precise temporal understanding. The pipeline incorporates image and video augmentations, boundary regression algorithm, and IoU-balanced classification loss to enhance recognition performance across multiple datasets.

The effectiveness of these strategies has been demonstrated through extensive experiments, achieving state-of-the-art results on key benchmarks like WLASL and Slovo. By offering a versatile and scalable solution, this work aims to facilitate further research and development in sign language recognition.

Performance

WLASL

Model	Pretrain Task	top-1 accuracy	config
MViTv2-S	MaskFeat	56.37	config
MViTv2-S	Classification	57.17	config
MViTv2-B	Classification	57.33	config
I3D	Classification	36.38	config

AUTSL

Model	Pretrain Task	top-1 accuracy	config
MViTv2-S	MaskFeat	95.62	config
MViTv2-S	Classification	95.05	config
MViTv2-B	Classification	95.75	config
I3D	Classification	87.81	config

Slovo

Model	Pretrain Task	top-1 accuracy	config
MViTv2-S	MaskFeat	81.57	config
MViTv2-S	Classification	80.97	config
MViTv2-B	Classification	81.34	config
I3D	Classification	63.82	config

SlovoExt

Model	Pretrain Task	top-1 accuracy	config
MViTv2-S	MaskFeat	87.31	config
MViTv2-S	Classification	85.90	config
MViTv2-B	Classification	86.72	config
I3D	Classification	79.74	config

Installation

conda create -n "strategies" python=3.11.5
conda activate strategies

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

This repository provides additional tools and extensions built for the official MMAction2 repository. To use the code, you first need to clone and set up MMAction2.

git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2 
git checkout -b strategies acb79e41

pip install timm==0.9.10 numpy==1.24.4 h5py scikit-image albumentations tensorboard

pip install -U openmim
mim install mmengine==0.10.5
mim install mmcv==2.0.1

Then, clone this repository and merge with MMAction2 repository.

cd ../
git clone (strategies)
cp -RT strategies/ mmaction2/
cd mmaction2/
pip install -v -e .

Data Preparation

Our approach supports datasets in both standard video formats and HDF5 for more efficient data handling. If you want to use HDF5 as we do, follow steps to convert dataset:

conda install libjpeg-turbo -c conda-forge
pip install -U git+https://github.com/lilohuang/PyTurboJPEG.git

To convert dataset, use the following command:

python tools/convert_to_hdf5video.py <input_file> [<output_file>] [<additional_parameters>]

Arguments

<input_file>:
Path to the .txt file of the original dataset. Video file paths in this file must be:
- Absolute, or
- Relative to this file’s location or the path specified in --video_root (if provided).
<output_file>:
Path to the output HDF5 file. The recommended extension is .hdf5video.
By default, the output file name is derived from <input_file> by replacing its extension with .hdf5video.

Additional Parameters

--video_root:
Specifies the root directory (absolute or relative to <input_file>) for locating video files.
Default: empty string (videos are searched relative to the directory containing <input_file>).
--max_resolution:
Scales videos so that their largest side is less than or equal to max_resolution.
Default: 300.
--min_resolution:
Scales videos so that their smallest side is greater than or equal to min_resolution.
Default: None.
Note: use one of the --max_resolution and --min_resolution.
--jpeg_quality:
Sets the JPEG encoding quality.
Default: 95.
num_workers:
Number of threads for parallel processing. Determines the conversion speed.
Default: 6.
--ignore_errors:
Enables error ignoring mode. If a video file is missing or unreadable, it is skipped, and a message is logged to the console. Without this flag, the process stops on the first error.

Configs Modifications

Update the dataset class to use Hdf5VideoDataset instead of VideoDataset.
Change the ann_file value from the path to a .txt file to the path of the corresponding .hdf5video file.
Use Hdf5VideoInit instead of DecordInit and remove parameters, if present.
Use Hdf5VideoDecode instead of DecordDecode and remove parameters, if present.

Training and Testing

For training and testing, follow the instructions provided in the official MMAction2 repository. Use the configuration files provided in this repository to train the models.

Authors and Credits

Citations

***

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs/strategies		configs/strategies
images		images
mmaction		mmaction
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Training Strategies for Isolated Sign Language Recognition

Performance

WLASL

AUTSL

Slovo

SlovoExt

Installation

Data Preparation

Arguments

Additional Parameters

Configs Modifications

Training and Testing

Authors and Credits

Citations

About

Uh oh!

Releases

Packages

Languages

ai-forever/TrainingStrategiesISLR

Folders and files

Latest commit

History

Repository files navigation

Training Strategies for Isolated Sign Language Recognition

Performance

WLASL

AUTSL

Slovo

SlovoExt

Installation

Data Preparation

Arguments

Additional Parameters

Configs Modifications

Training and Testing

Authors and Credits

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages