Skip to content

team-hopkins/sign_language_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Sign Language Detection System

A modular sign language detection system that uses MediaPipe and OpenCV for hand tracking, and PyTorch for ASL (American Sign Language) word classification. Built to be integrated into larger applications for accessibility.

๐ŸŽฏ Project Goal

Build a sign language model that can detect sentences for sign language, designed for POV (point-of-view) camera usage to help blind people interact with sign language users.

โœจ Key Features

  • Real-time ASL Detection: Landmark-based classification using MediaPipe and PyTorch
  • Context-Aware Dictionary: Maps letters to meaningful words (C โ†’ "Coffee", H โ†’ "Hello")
  • REST API: FastAPI endpoints for file upload and base64 image processing
  • Raspberry Pi Integration: Optimized for edge devices with semantic routing
  • Hackathon Ready: Social interaction features for accessibility demonstrations

๐Ÿ“ Project Structure

sign_language_detection/
โ”œโ”€โ”€ api/
โ”‚   โ”œโ”€โ”€ app.py                   # FastAPI application with context support
โ”‚   โ”œโ”€โ”€ test_client.py           # API test client
โ”‚   โ””โ”€โ”€ README.md                # API documentation
โ”œโ”€โ”€ features/sign_language/
โ”‚   โ”œโ”€โ”€ hand_tracker.py          # MediaPipe hand detection
โ”‚   โ”œโ”€โ”€ word_classifier.py       # PyTorch classifier
โ”‚   โ”œโ”€โ”€ context_dictionary.py    # Context-aware word mapping
โ”‚   โ””โ”€โ”€ dataset.py               # PyTorch Dataset class
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ preprocess_fast.py       # Fast preprocessing (200 samples/class)
โ”‚   โ”œโ”€โ”€ train_word_model.py      # Training script
โ”‚   โ””โ”€โ”€ evaluate_test.py         # Test set evaluation
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ asl_alphabet_train/      # Downloaded training images
โ”‚   โ”œโ”€โ”€ asl_alphabet_test/       # Downloaded test images
โ”‚   โ””โ”€โ”€ processed/               # Preprocessed landmark data
โ”‚       โ””โ”€โ”€ train/
โ”‚           โ”œโ”€โ”€ landmarks.npz
โ”‚           โ””โ”€โ”€ label_mapping.npy
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ best_model.pth           # Trained ASL classifier (2MB)
โ”‚   โ””โ”€โ”€ final_model.pth          # Final checkpoint (673KB)
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ DATASET.md               # Dataset acquisition guide
โ”‚   โ””โ”€โ”€ CONTEXT_DICTIONARY.md    # Context dictionary documentation
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ test_context_dictionary.py  # Context demo script
โ”‚   โ””โ”€โ”€ custom_context_dictionary.json  # Example custom dictionary
โ”œโ”€โ”€ Dockerfile                   # Production container
โ”œโ”€โ”€ docker-compose.yml           # Docker deployment config
โ”œโ”€โ”€ requirements.txt             # All dependencies (dev + production)
โ””โ”€โ”€ README.md                    # This file

๐Ÿš€ Quick Start

1. Environment Setup

Create and activate the conda environment (Python 3.11 required for MediaPipe compatibility):

conda create -n asl_env python=3.11 -y
conda activate asl_env
pip install -r requirements.txt

2. Download Dataset

Download the ASL Alphabet dataset from Kaggle:

# Install Kaggle CLI and set up credentials first (see docs/DATASET.md)
kaggle datasets download -d grassknoted/asl-alphabet --unzip -p data/

3. Preprocess Data

Extract MediaPipe hand landmarks from images:

python scripts/preprocess_fast.py

4. Train Model

Train the ASL word classifier:

python scripts/train_word_model.py

5. Evaluate Model

Test the model on test dataset:

python scripts/evaluate_test.py

6. Run API Server

Start the FastAPI server:

python -m uvicorn api.app:app --host 0.0.0.0 --port 8000

Visit API docs: http://localhost:8000/docs

7. Test Context Dictionary (Optional)

Test the context-aware dictionary feature:

python examples/test_context_dictionary.py

See docs/CONTEXT_DICTIONARY.md for full documentation.

๐ŸŽจ Context Dictionary Feature

The context dictionary maps ASL letters to meaningful words AND full sentences for natural communication:

Full Sentences:

  • C โ†’ "I need a coffee", "It's too cold", "Can I have some cake?"
  • H โ†’ "Hello! How are you?", "Help me please", "I'm hungry"
  • T โ†’ "Thank you so much", "I'm thirsty", "I'm very tired"
  • W โ†’ "You're welcome", "I need water", "Please wait"

Quick Words:

  • A โ†’ "Apple", "Again", "Attention", "Ahead"
  • Y โ†’ "Yes, I agree", "You are right", "Your turn"

API Response with Context

{
  "predicted_sign": "C",
  "confidence": 0.95,
  "contextual_meaning": "I need a coffee",
  "alternative_contexts": [
    "It's too cold",
    "Can I have some cake?",
    "Cheese please"
  ]
}

Use Cases:

  • Blind users hear "I need a coffee" instead of just "C" via text-to-speech
  • Complete sentences provide full context without spelling
  • Natural communication in care settings (hospital, home)
  • Hackathon demos for realistic social interaction

See docs/CONTEXT_DICTIONARY.md for custom dictionaries and advanced usage.

๐Ÿ“Š Model Performance

Metric Value
Training Accuracy 82.40%
Test Accuracy 59.26% (16/27 correct)
API Prediction Confidence 99.58% (on high-quality images)
Classes Supported 28 (A-Z, del, space)
Model Size 210 KB
Inference Time ~50ms per image (CPU)

Test Results

Strong performers (>90% confidence):

  • โœ… B, C, F, G, I, J, L, M, Q, W, X, Y

Detection challenges:

  • Hand detection failures on: A, H, N, O, space
  • Similar sign confusion: Pโ†”Q, Rโ†”G, Tโ†”I, Uโ†”R

๐Ÿ”ง Core Components

Hand Tracker (hand_tracker.py)

  • Uses MediaPipe Hands for real-time hand landmark detection
  • Extracts 21 3D landmarks (x, y, z) per hand
  • Optimized for single-hand detection

Word Classifier (word_classifier.py)

  • Simple feedforward neural network (PyTorch)
  • Input: 63 features (21 landmarks ร— 3 coordinates)
  • Hidden layers: 512 โ†’ 256 neurons with dropout
  • Output: 28 classes (A-Z, del, space)

Dataset Class (dataset.py)

  • PyTorch Dataset for loading preprocessed landmarks
  • Handles label mapping and normalization
  • Efficient loading from compressed .npz files

FastAPI Application (api/app.py)

  • Production-ready REST API
  • File upload and base64 prediction endpoints
  • CORS enabled for Raspberry Pi integration
  • Health checks and monitoring

๐Ÿš€ API Deployment

Local Testing

# Start API server
python -m uvicorn api.app:app --reload --port 8000

# Test API
python api/test_client.py

# View interactive docs
open http://localhost:8000/docs

Docker Deployment

# Build and run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f

# Stop service
docker-compose down

Digital Ocean Deployment

See api/README.md for complete deployment instructions.

๐Ÿ“ก API Endpoints

  • GET /health - Health check
  • GET /classes - List supported classes
  • POST /predict - Predict from image file upload
  • POST /predict/base64 - Predict from base64 encoded image (Raspberry Pi integration)

Full API documentation: http://localhost:8000/docs

๐Ÿ“ฆ Classes Supported

The model recognizes 28 ASL signs:

  • Letters: A-Z
  • Special: del (delete), space

๐Ÿ”ฎ Future Enhancements

๐ŸŽฏ SenseAI Project Integration

This Sign Language Detection module is part of the SenseAI accessibility system.

Complete Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Web Camera     โ”‚
โ”‚  (Raspberry Pi) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Semantic Router        โ”‚
โ”‚  (Gemini + OpenRouter)  โ”‚
โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”˜
   โ”‚        โ”‚          โ”‚
   โ–ผ        โ–ผ          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   TTS   โ”‚ โ”‚  Face   โ”‚ โ”‚Sign Language โ”‚
โ”‚   API   โ”‚ โ”‚   API   โ”‚ โ”‚     API      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 (Eleven    (Custom)    (This Module)
  Labs)

All APIs deployed on Digital Ocean

Raspberry Pi Integration

import cv2, base64, requests

# Capture & encode frame
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
_, buffer = cv2.imencode('.jpg', frame)
img_b64 = base64.b64encode(buffer).decode()

# Call API
response = requests.post(
    "http://your-api:8000/predict/base64",
    json={"image_base64": img_b64}
)
print(response.json())

๐Ÿ”ฎ Future Enhancements

  • Add temporal models (LSTM/Transformer) for sentence detection
  • Add data augmentation for improved robustness
  • Support for continuous sign language (not just fingerspelling)
  • Mobile deployment (TFLite/CoreML conversion)
  • Multi-hand support for two-handed signs

๐Ÿ“š Dataset Information

Source: Kaggle ASL Alphabet Dataset

  • Training: 87,000 images (3,000 per class ร— 29 classes)
  • Format: 200ร—200 RGB images
  • Classes: 29 (A-Z + del + nothing + space)
  • License: See Kaggle dataset page

For more details, see docs/DATASET.md.

๐Ÿ› ๏ธ Development

File Organization

  • Keep feature modules independent for easy integration
  • All data preprocessing outputs go to data/processed/
  • Models are saved in models/
  • Scripts are in scripts/

Adding New Features

  1. Create new module in features/sign_language/
  2. Import existing components as needed
  3. Update conversational_asl.py for integration
  4. Document in this README

๐Ÿ“„ License

This project is for educational and accessibility purposes.

๐Ÿ™ Acknowledgments

  • MediaPipe team for excellent hand tracking
  • Kaggle ASL Alphabet dataset contributors
  • PyTorch community

Built with โค๏ธ for accessibility

So I have thought of building it in this way that, We have 3 features:

Text-to-Speech and Speech-to-Text by using Eleven Labs People recognition built in model Sign Language Detection built in model Then we will create an API endpoint for each of them Docerize and deploy it in Digital Ocean

Now we will have a Semantic Router using Gemini which will be having an OpenRouter Interface which will check the camera frames and text and determine which API to call. The image frames and audio frames will be passed through a Web Camera, and the semantic router will be deployed in the Raspberry PI.

Now I am working in the Sign Language feature. We have created the model right now. So I must create an API endpoint which contains our optimised model and must be able to give the predictions.

Hi I have tested it is working fine for letters but I was thinking, if we can annotate the data in this way the Letter C is Coffee, Letter A is Apple, since we need to achieve the social interaction, so if we are passing the image frame which has a hand gesture of C it will lookup at a Dictionary and give that word or sentence as a response depends on what we keep in our dictionary table. Since it is a hackathon and our goal is to demo this device as a social interaction for disabled people so this will be a good strategy. Please tell me how will you implement it

About

Sign_Language_Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published