This is the official repository for the paper "Automated Sign Language Tutor: A Dual-Language Real-Time Approach for RSL and ASL".
Appearance and video of the stand operation:
Important Note: This repository contains a backend system for sign language recognition (Streaming Sign Recognition Engine and Controller Process components). The provided frontend (ws.html, ws.css, ws.js) serves as a basic example implementation to demonstrate backend functionality and is not a production-ready application. Developers should create their own frontend implementations tailored to specific use cases while using this backend as a recognition service.
It is also possible to use the backend with custom models.
- Two operating modes:
- LIVE: Real-time gesture recognition
- Training: Mode for teaching the model new gestures
- Dual language support: Russian and English interface and recognition models
- WebSocket interaction: Client sends video stream, server returns recognized gestures
- Visual feedback: Notification system for users
- Responsive interface: Modern design, user-friendly
- Python 3.7+
- Node.js (optional for frontend)
- Web camera
pip install -r requirements.txtDownload the models
- https://rndml-team-cv.obs.ru-moscow-1.hc.sbercloud.ru/rsl/demostand_models/tsm/ru/mobilenet_demostand_ru.onnx
- https://rndml-team-cv.obs.ru-moscow-1.hc.sbercloud.ru/rsl/demostand_models/tsm/en/mobilenet_demostand_en.onnx
and place them in the folder models/checkpoints/.
python server_fapi.pyThe server will run at localhost:3003.
docker build -t sign-tutor .
docker run -it -d -v $PWD:/app -p 3003:3003 sign-tutor- Open
ws.htmlin your browser - Click "Enable Camera" to access your webcam
- After enabling the camera, the "Start Stream" button will become available
- Select operating mode:
- LIVE: Real-time gesture recognition
- Training:
- Enter gesture name in the text field
- Click "Select Gesture"
- Perform the gesture in front of the camera
- The system will notify you when the gesture is recognized correctly
- Switching interface and model language: Use the RU/EN buttons in the top right corner
- The recognition result is displayed in the server console.
-
Place ONNX models for Russian and English in the
models/checkpoints/folder -
Update the configuration files:
models/config_ru.yamlfor the Russian model,models/config_en.yamlfor the English model.
Use the available files as examples.
- Update the sign class files:
models/constants_ru.pywith the `classes' variable for Russian,models/constants_en.pywith the `classes' variable for English.
Use the available files as examples.
-
Establishing connection:
- Client opens WebSocket connection to
ws://localhost:3003/ - Server initializes the default language model (Russian)
- Client opens WebSocket connection to
-
Main workflow:
sequenceDiagram
participant Client
participant Server
participant Model
Client->>Server: {"type": "LANGUAGE", "lang": "ru"}
Server->>Model: Initialize Russian model
Model-->>Server: Model ready
Server-->>Client: {"status": 200, "message": "Language changed to ru"}
Client->>Server: {"type": "MODE", "mode": "TRAINING"}
Server-->>Client: {"status": 200, "message": "New MODE TRAINING set correctly"}
Client->>Server: {"type": "GLOSS", "gloss": "привет"}
Server-->>Client: {"status": 200, "message": "New GLOSS привет set correctly"}
loop Each frame (30 FPS)
Client->>Server: {"type": "IMAGE", "image": "data:image/jpeg"}
Server->>Model: Process frame
Model-->>Server: Recognition result
alt Gesture recognized
Server-->>Client: {"text": "привет", "type": "WORD"}
end
end
Client (Frontend):
view/ws.html: Main HTML interface fileview/ws.css: Interface stylesview/ws.js: Camera and WebSocket logic
Server (Backend):
server_fapi.py: Main server code (FastAPI)models/model.py: Recognition model logicRunner: Class for managing video processing flowRecognitionMP: Process for gesture recognition in a separate thread
Model:
- ONNX models for gesture recognition
- Configuration files for Russian and English versions
- Gesture class files for each language
-
Camera management:
- Webcam access via WebRTC
- Frame capture and base64 encoding
- Frame sending at specified frequency (30 FPS)
-
Mode management:
- Smooth switching between LIVE and Training modes
- Dynamic UI element display based on current mode
-
Localization:
- Full support for Russian and English
- Language preference persistence between sessions
-
Feedback:
- Animated notification system
- Visual confirmation of user actions
- Error handling and display
-
Video processing:
- Base64 decoding to OpenCV image
- Frame preprocessing for neural network
- Frame buffering for sequence analysis
-
Model management:
- Dynamic model loading for different languages
- Multithreaded processing to minimize latency
- Proper resource cleanup when switching languages
-
Gesture recognition:
- Sequence analysis for gesture recognition
- Threshold filtering to reduce false positives
- Frame rate: Change the
FPSvalue inws.js - Video resolution: Change
widthandheightattributes of the<video>element inws.html - Recognition threshold: Change the
thresholdvalue in model configuration files
- Create a new configuration file
models/config_<language>.yaml - Add a gesture class file
models/constants_<language>.py - Update translation dictionaries in
ws.js:
const translations = {
// ...
<language>: {
title: "...",
startWebcam: "...",
// ... other texts
}
}-
Camera not working:
- Check browser permissions
- Ensure no other applications are using the camera
- Try reloading the page
-
No connection to server:
- Ensure the server is running (
python server_fapi.py) - Check WebSocket address in
ws.js - Ensure no firewall blocking
- Ensure the server is running (
-
Model not loading:
- Check model paths in configuration files
- Ensure model files exist
- Verify content of gesture class files
- Client: Open browser developer console (F12)
- Server: Logs are printed in the server console
— Petr Surovtsev
— Alexander Nagaev
— Alexander Kapitanov
— Ilya Ovodov
This project is licensed under the APACHE License. See the LICENSE file for details.
For questions and suggestions, please use the project's Issues section.
