We introduce a robust training pipeline for Isolated Sign Language Recognition (ISLR). The approach addresses challenges specific to sign language recognition, such as limited data quality, diverse signing speeds, and the need for precise temporal understanding. The pipeline incorporates image and video augmentations, boundary regression algorithm, and IoU-balanced classification loss to enhance recognition performance across multiple datasets.
The effectiveness of these strategies has been demonstrated through extensive experiments, achieving state-of-the-art results on key benchmarks like WLASL and Slovo. By offering a versatile and scalable solution, this work aims to facilitate further research and development in sign language recognition.
| Model | Pretrain Task | top-1 accuracy | config |
|---|---|---|---|
| MViTv2-S | MaskFeat | 56.37 | config |
| MViTv2-S | Classification | 57.17 | config |
| MViTv2-B | Classification | 57.33 | config |
| I3D | Classification | 36.38 | config |
| Model | Pretrain Task | top-1 accuracy | config |
|---|---|---|---|
| MViTv2-S | MaskFeat | 95.62 | config |
| MViTv2-S | Classification | 95.05 | config |
| MViTv2-B | Classification | 95.75 | config |
| I3D | Classification | 87.81 | config |
| Model | Pretrain Task | top-1 accuracy | config |
|---|---|---|---|
| MViTv2-S | MaskFeat | 81.57 | config |
| MViTv2-S | Classification | 80.97 | config |
| MViTv2-B | Classification | 81.34 | config |
| I3D | Classification | 63.82 | config |
| Model | Pretrain Task | top-1 accuracy | config |
|---|---|---|---|
| MViTv2-S | MaskFeat | 87.31 | config |
| MViTv2-S | Classification | 85.90 | config |
| MViTv2-B | Classification | 86.72 | config |
| I3D | Classification | 79.74 | config |
conda create -n "strategies" python=3.11.5
conda activate strategiesconda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidiaThis repository provides additional tools and extensions built for the official MMAction2 repository. To use the code, you first need to clone and set up MMAction2.
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
git checkout -b strategies acb79e41
pip install timm==0.9.10 numpy==1.24.4 h5py scikit-image albumentations tensorboard
pip install -U openmim
mim install mmengine==0.10.5
mim install mmcv==2.0.1Then, clone this repository and merge with MMAction2 repository.
cd ../
git clone (strategies)
cp -RT strategies/ mmaction2/
cd mmaction2/
pip install -v -e .Our approach supports datasets in both standard video formats and HDF5 for more efficient data handling. If you want to use HDF5 as we do, follow steps to convert dataset:
conda install libjpeg-turbo -c conda-forge
pip install -U git+https://github.com/lilohuang/PyTurboJPEG.gitTo convert dataset, use the following command:
python tools/convert_to_hdf5video.py <input_file> [<output_file>] [<additional_parameters>]-
<input_file>:
Path to the.txtfile of the original dataset. Video file paths in this file must be:- Absolute, or
- Relative to this file’s location or the path specified in
--video_root(if provided).
-
<output_file>:
Path to the output HDF5 file. The recommended extension is.hdf5video.
By default, the output file name is derived from<input_file>by replacing its extension with.hdf5video.
-
--video_root:
Specifies the root directory (absolute or relative to<input_file>) for locating video files.
Default: empty string (videos are searched relative to the directory containing<input_file>). -
--max_resolution:
Scales videos so that their largest side is less than or equal tomax_resolution.
Default:300. -
--min_resolution:
Scales videos so that their smallest side is greater than or equal tomin_resolution.
Default:None.
Note: use one of the--max_resolutionand--min_resolution. -
--jpeg_quality:
Sets the JPEG encoding quality.
Default:95. -
num_workers:
Number of threads for parallel processing. Determines the conversion speed.
Default:6. -
--ignore_errors:
Enables error ignoring mode. If a video file is missing or unreadable, it is skipped, and a message is logged to the console. Without this flag, the process stops on the first error.
-
Update the dataset class to use
Hdf5VideoDatasetinstead ofVideoDataset. -
Change the
ann_filevalue from the path to a.txtfile to the path of the corresponding.hdf5videofile. -
Use
Hdf5VideoInitinstead ofDecordInitand remove parameters, if present. -
Use
Hdf5VideoDecodeinstead ofDecordDecodeand remove parameters, if present.
For training and testing, follow the instructions provided in the official MMAction2 repository. Use the configuration files provided in this repository to train the models.
- Karina Kvanchiani
- Roman Kraynov
- Elizaveta Petrova
- Petr Surovcev
- Aleksandr Nagaev
- Alexander Kapitanov
***
