Skip to content

ai-forever/TrainingStrategiesISLR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Training Strategies for Isolated Sign Language Recognition

We introduce a robust training pipeline for Isolated Sign Language Recognition (ISLR). The approach addresses challenges specific to sign language recognition, such as limited data quality, diverse signing speeds, and the need for precise temporal understanding. The pipeline incorporates image and video augmentations, boundary regression algorithm, and IoU-balanced classification loss to enhance recognition performance across multiple datasets.

The effectiveness of these strategies has been demonstrated through extensive experiments, achieving state-of-the-art results on key benchmarks like WLASL and Slovo. By offering a versatile and scalable solution, this work aims to facilitate further research and development in sign language recognition.

Performance

WLASL

Model Pretrain Task top-1 accuracy config
MViTv2-S MaskFeat 56.37 config
MViTv2-S Classification 57.17 config
MViTv2-B Classification 57.33 config
I3D Classification 36.38 config

AUTSL

Model Pretrain Task top-1 accuracy config
MViTv2-S MaskFeat 95.62 config
MViTv2-S Classification 95.05 config
MViTv2-B Classification 95.75 config
I3D Classification 87.81 config

Slovo

Model Pretrain Task top-1 accuracy config
MViTv2-S MaskFeat 81.57 config
MViTv2-S Classification 80.97 config
MViTv2-B Classification 81.34 config
I3D Classification 63.82 config

SlovoExt

Model Pretrain Task top-1 accuracy config
MViTv2-S MaskFeat 87.31 config
MViTv2-S Classification 85.90 config
MViTv2-B Classification 86.72 config
I3D Classification 79.74 config

Installation

conda create -n "strategies" python=3.11.5
conda activate strategies
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

This repository provides additional tools and extensions built for the official MMAction2 repository. To use the code, you first need to clone and set up MMAction2.

git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2 
git checkout -b strategies acb79e41

pip install timm==0.9.10 numpy==1.24.4 h5py scikit-image albumentations tensorboard

pip install -U openmim
mim install mmengine==0.10.5
mim install mmcv==2.0.1

Then, clone this repository and merge with MMAction2 repository.

cd ../
git clone (strategies)
cp -RT strategies/ mmaction2/
cd mmaction2/
pip install -v -e .

Data Preparation

Our approach supports datasets in both standard video formats and HDF5 for more efficient data handling. If you want to use HDF5 as we do, follow steps to convert dataset:

conda install libjpeg-turbo -c conda-forge
pip install -U git+https://github.com/lilohuang/PyTurboJPEG.git

To convert dataset, use the following command:

python tools/convert_to_hdf5video.py <input_file> [<output_file>] [<additional_parameters>]

Arguments

  • <input_file>:
    Path to the .txt file of the original dataset. Video file paths in this file must be:

    • Absolute, or
    • Relative to this file’s location or the path specified in --video_root (if provided).
  • <output_file>:
    Path to the output HDF5 file. The recommended extension is .hdf5video.
    By default, the output file name is derived from <input_file> by replacing its extension with .hdf5video.

Additional Parameters

  • --video_root:
    Specifies the root directory (absolute or relative to <input_file>) for locating video files.
    Default: empty string (videos are searched relative to the directory containing <input_file>).

  • --max_resolution:
    Scales videos so that their largest side is less than or equal to max_resolution.
    Default: 300.

  • --min_resolution:
    Scales videos so that their smallest side is greater than or equal to min_resolution.
    Default: None.
    Note: use one of the --max_resolution and --min_resolution.

  • --jpeg_quality:
    Sets the JPEG encoding quality.
    Default: 95.

  • num_workers:
    Number of threads for parallel processing. Determines the conversion speed.
    Default: 6.

  • --ignore_errors:
    Enables error ignoring mode. If a video file is missing or unreadable, it is skipped, and a message is logged to the console. Without this flag, the process stops on the first error.

Configs Modifications

  • Update the dataset class to use Hdf5VideoDataset instead of VideoDataset.

  • Change the ann_file value from the path to a .txt file to the path of the corresponding .hdf5video file.

  • Use Hdf5VideoInit instead of DecordInit and remove parameters, if present.

  • Use Hdf5VideoDecode instead of DecordDecode and remove parameters, if present.

Training and Testing

For training and testing, follow the instructions provided in the official MMAction2 repository. Use the configuration files provided in this repository to train the models.

Authors and Credits

Citations

***

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages