Sign Language Detection System

A modular sign language detection system that uses MediaPipe and OpenCV for hand tracking, and PyTorch for ASL (American Sign Language) word classification. Built to be integrated into larger applications for accessibility.

🎯 Project Goal

Build a sign language model that can detect sentences for sign language, designed for POV (point-of-view) camera usage to help blind people interact with sign language users.

✨ Key Features

Real-time ASL Detection: Landmark-based classification using MediaPipe and PyTorch
Context-Aware Dictionary: Maps letters to meaningful words (C → "Coffee", H → "Hello")
REST API: FastAPI endpoints for file upload and base64 image processing
Raspberry Pi Integration: Optimized for edge devices with semantic routing
Hackathon Ready: Social interaction features for accessibility demonstrations

📁 Project Structure

sign_language_detection/
├── api/
│   ├── app.py                   # FastAPI application with context support
│   ├── test_client.py           # API test client
│   └── README.md                # API documentation
├── features/sign_language/
│   ├── hand_tracker.py          # MediaPipe hand detection
│   ├── word_classifier.py       # PyTorch classifier
│   ├── context_dictionary.py    # Context-aware word mapping
│   └── dataset.py               # PyTorch Dataset class
├── scripts/
│   ├── preprocess_fast.py       # Fast preprocessing (200 samples/class)
│   ├── train_word_model.py      # Training script
│   └── evaluate_test.py         # Test set evaluation
├── data/
│   ├── asl_alphabet_train/      # Downloaded training images
│   ├── asl_alphabet_test/       # Downloaded test images
│   └── processed/               # Preprocessed landmark data
│       └── train/
│           ├── landmarks.npz
│           └── label_mapping.npy
├── models/
│   ├── best_model.pth           # Trained ASL classifier (2MB)
│   └── final_model.pth          # Final checkpoint (673KB)
├── docs/
│   ├── DATASET.md               # Dataset acquisition guide
│   └── CONTEXT_DICTIONARY.md    # Context dictionary documentation
├── examples/
│   ├── test_context_dictionary.py  # Context demo script
│   └── custom_context_dictionary.json  # Example custom dictionary
├── Dockerfile                   # Production container
├── docker-compose.yml           # Docker deployment config
├── requirements.txt             # All dependencies (dev + production)
└── README.md                    # This file

🚀 Quick Start

1. Environment Setup

Create and activate the conda environment (Python 3.11 required for MediaPipe compatibility):

conda create -n asl_env python=3.11 -y
conda activate asl_env
pip install -r requirements.txt

2. Download Dataset

Download the ASL Alphabet dataset from Kaggle:

# Install Kaggle CLI and set up credentials first (see docs/DATASET.md)
kaggle datasets download -d grassknoted/asl-alphabet --unzip -p data/

3. Preprocess Data

Extract MediaPipe hand landmarks from images:

python scripts/preprocess_fast.py

4. Train Model

Train the ASL word classifier:

python scripts/train_word_model.py

5. Evaluate Model

Test the model on test dataset:

python scripts/evaluate_test.py

6. Run API Server

Start the FastAPI server:

python -m uvicorn api.app:app --host 0.0.0.0 --port 8000

Visit API docs: http://localhost:8000/docs

7. Test Context Dictionary (Optional)

Test the context-aware dictionary feature:

python examples/test_context_dictionary.py

See docs/CONTEXT_DICTIONARY.md for full documentation.

🎨 Context Dictionary Feature

The context dictionary maps ASL letters to meaningful words AND full sentences for natural communication:

Full Sentences:

C → "I need a coffee", "It's too cold", "Can I have some cake?"
H → "Hello! How are you?", "Help me please", "I'm hungry"
T → "Thank you so much", "I'm thirsty", "I'm very tired"
W → "You're welcome", "I need water", "Please wait"

Quick Words:

A → "Apple", "Again", "Attention", "Ahead"
Y → "Yes, I agree", "You are right", "Your turn"

API Response with Context

{
  "predicted_sign": "C",
  "confidence": 0.95,
  "contextual_meaning": "I need a coffee",
  "alternative_contexts": [
    "It's too cold",
    "Can I have some cake?",
    "Cheese please"
  ]
}

Use Cases:

Blind users hear "I need a coffee" instead of just "C" via text-to-speech
Complete sentences provide full context without spelling
Natural communication in care settings (hospital, home)
Hackathon demos for realistic social interaction

See docs/CONTEXT_DICTIONARY.md for custom dictionaries and advanced usage.

📊 Model Performance

Metric	Value
Training Accuracy	82.40%
Test Accuracy	59.26% (16/27 correct)
API Prediction Confidence	99.58% (on high-quality images)
Classes Supported	28 (A-Z, del, space)
Model Size	210 KB
Inference Time	~50ms per image (CPU)

Test Results

Strong performers (>90% confidence):

✅ B, C, F, G, I, J, L, M, Q, W, X, Y

Detection challenges:

Hand detection failures on: A, H, N, O, space
Similar sign confusion: P↔Q, R↔G, T↔I, U↔R

🔧 Core Components

Hand Tracker (`hand_tracker.py`)

Uses MediaPipe Hands for real-time hand landmark detection
Extracts 21 3D landmarks (x, y, z) per hand
Optimized for single-hand detection

Word Classifier (`word_classifier.py`)

Simple feedforward neural network (PyTorch)
Input: 63 features (21 landmarks × 3 coordinates)
Hidden layers: 512 → 256 neurons with dropout
Output: 28 classes (A-Z, del, space)

Dataset Class (`dataset.py`)

PyTorch Dataset for loading preprocessed landmarks
Handles label mapping and normalization
Efficient loading from compressed .npz files

FastAPI Application (`api/app.py`)

Production-ready REST API
File upload and base64 prediction endpoints
CORS enabled for Raspberry Pi integration
Health checks and monitoring

🚀 API Deployment

Local Testing

# Start API server
python -m uvicorn api.app:app --reload --port 8000

# Test API
python api/test_client.py

# View interactive docs
open http://localhost:8000/docs

Docker Deployment

# Build and run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f

# Stop service
docker-compose down

Digital Ocean Deployment

See api/README.md for complete deployment instructions.

📡 API Endpoints

GET /health - Health check
GET /classes - List supported classes
POST /predict - Predict from image file upload
POST /predict/base64 - Predict from base64 encoded image (Raspberry Pi integration)

Full API documentation: http://localhost:8000/docs

📦 Classes Supported

The model recognizes 28 ASL signs:

Letters: A-Z
Special: del (delete), space

🔮 Future Enhancements

🎯 SenseAI Project Integration

This Sign Language Detection module is part of the SenseAI accessibility system.

Complete Architecture

┌─────────────────┐
│  Web Camera     │
│  (Raspberry Pi) │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│  Semantic Router        │
│  (Gemini + OpenRouter)  │
└──┬────────┬──────────┬──┘
   │        │          │
   ▼        ▼          ▼
┌─────────┐ ┌─────────┐ ┌──────────────┐
│   TTS   │ │  Face   │ │Sign Language │
│   API   │ │   API   │ │     API      │
└─────────┘ └─────────┘ └──────────────┘
 (Eleven    (Custom)    (This Module)
  Labs)

All APIs deployed on Digital Ocean

Raspberry Pi Integration

import cv2, base64, requests

# Capture & encode frame
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
_, buffer = cv2.imencode('.jpg', frame)
img_b64 = base64.b64encode(buffer).decode()

# Call API
response = requests.post(
    "http://your-api:8000/predict/base64",
    json={"image_base64": img_b64}
)
print(response.json())

🔮 Future Enhancements

Add temporal models (LSTM/Transformer) for sentence detection
Add data augmentation for improved robustness
Support for continuous sign language (not just fingerspelling)
Mobile deployment (TFLite/CoreML conversion)
Multi-hand support for two-handed signs

📚 Dataset Information

Source: Kaggle ASL Alphabet Dataset

Training: 87,000 images (3,000 per class × 29 classes)
Format: 200×200 RGB images
Classes: 29 (A-Z + del + nothing + space)
License: See Kaggle dataset page

For more details, see docs/DATASET.md.

🛠️ Development

File Organization

Keep feature modules independent for easy integration
All data preprocessing outputs go to data/processed/
Models are saved in models/
Scripts are in scripts/

Adding New Features

Create new module in features/sign_language/
Import existing components as needed
Update conversational_asl.py for integration
Document in this README

📄 License

This project is for educational and accessibility purposes.

🙏 Acknowledgments

MediaPipe team for excellent hand tracking
Kaggle ASL Alphabet dataset contributors
PyTorch community

Built with ❤️ for accessibility

So I have thought of building it in this way that, We have 3 features:

Text-to-Speech and Speech-to-Text by using Eleven Labs People recognition built in model Sign Language Detection built in model Then we will create an API endpoint for each of them Docerize and deploy it in Digital Ocean

Now we will have a Semantic Router using Gemini which will be having an OpenRouter Interface which will check the camera frames and text and determine which API to call. The image frames and audio frames will be passed through a Web Camera, and the semantic router will be deployed in the Raspberry PI.

Now I am working in the Sign Language feature. We have created the model right now. So I must create an API endpoint which contains our optimised model and must be able to give the predictions.

Hi I have tested it is working fine for letters but I was thinking, if we can annotate the data in this way the Letter C is Coffee, Letter A is Apple, since we need to achieve the social interaction, so if we are passing the image frame which has a hand gesture of C it will lookup at a Dictionary and give that word or sentence as a response depends on what we keep in our dictionary table. Since it is a hackathon and our goal is to demo this device as a social interaction for disabled people so this will be a good strategy. Please tell me how will you implement it

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
api		api
docs		docs
examples		examples
features/sign_language		features/sign_language
models		models
scripts		scripts
test_images		test_images
.dockerignore		.dockerignore
.gitignore		.gitignore
CLEANUP.md		CLEANUP.md
CONTEXT_IMPLEMENTATION_SUMMARY.md		CONTEXT_IMPLEMENTATION_SUMMARY.md
Dockerfile		Dockerfile
QUICKSTART.md		QUICKSTART.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
SEMANTIC_ROUTER_INTEGRATION.md		SEMANTIC_ROUTER_INTEGRATION.md
docker-compose.yml		docker-compose.yml
quick_api_test.py		quick_api_test.py
requirements.txt		requirements.txt
split_dataset.py		split_dataset.py
test_context_feature.py		test_context_feature.py
test_docker_api.py		test_docker_api.py
test_vm_deployment.py		test_vm_deployment.py

team-hopkins/sign_language_detection

Folders and files

Latest commit

History

Repository files navigation

Sign Language Detection System

🎯 Project Goal

✨ Key Features

📁 Project Structure

🚀 Quick Start

1. Environment Setup

2. Download Dataset

3. Preprocess Data

4. Train Model

5. Evaluate Model

6. Run API Server

7. Test Context Dictionary (Optional)

🎨 Context Dictionary Feature

API Response with Context

📊 Model Performance

Test Results

🔧 Core Components

Hand Tracker (hand_tracker.py)

Word Classifier (word_classifier.py)

Dataset Class (dataset.py)

FastAPI Application (api/app.py)

🚀 API Deployment

Local Testing

Docker Deployment

Digital Ocean Deployment

📡 API Endpoints

📦 Classes Supported

🔮 Future Enhancements

🎯 SenseAI Project Integration

Complete Architecture

Raspberry Pi Integration

🔮 Future Enhancements

📚 Dataset Information

🛠️ Development

File Organization

Adding New Features

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Hand Tracker (`hand_tracker.py`)

Word Classifier (`word_classifier.py`)

Dataset Class (`dataset.py`)

FastAPI Application (`api/app.py`)

Packages