✋ SigniVision: Real-Time Sign Language Detection with Audio Output

SigniVision is an AI-powered web application that detects Sign Language (SL) signs in real-time using a webcam and generates audio feedback using state-of-the-art transformer-based text-to-speech (TTS).

This project bridges the communication gap between the hearing-impaired and the general public using deep learning models such as YOLOv5 (for hand sign detection) and VITS (for audio synthesis).

🚀 Features

🖐️ Real-time detection of Indian Sign Language words
🔍 YOLOv5 custom-trained model for sign recognition
🔊 Transformer-based VITS model for generating speech
🧠 FastAPI backend with easy RESTful endpoint
🌐 React frontend with live webcam capture (see /frontend)
🧪 CORS-enabled backend to allow frontend integration

🛠 Tech Stack

Layer	Technology
Frontend	React, Tailwind CSS, Lucide Icons
Backend	FastAPI, Python, Torch, OpenCV
ML Model	YOLOv5 for detection, VITS for TTS
Utilities	PIL, NumPy, `transformers`, `torch.hub`

📂 Folder Structure


SigniVision/
│
├── model_backend/               # FastAPI backend with YOLOv5 + VITS integration
│   ├── main.py
│   └── ...
│
├── model/                 # YOLOv5 model directory
│      └── best.pt
|      └── signlang.ipynb  # for model fine tuning 
├── frontend/              # React frontend with webcam and UI
│
├── f2/                    # training dataset
|
├── assets/                # Assets like images, PDFs, and demos
│   └── presentation.pdf
│
├── README.md
└── requirements.txt

⚙️ Setup Instructions

🔁 Clone the Repository

git clone https://github.com/XML-project-2k25/SigniVision.git
cd SigniVision

🧠 Backend Setup (FastAPI)

1. Create and activate environment

cd model_backend
python3 -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows

2. Install dependencies

pip install -r requirements.txt

3. Run FastAPI server

uvicorn main:app --reload

API will be live at http://localhost:8000

🧪 Test API with `/predict/`

Send a POST request to /predict/ with an image file:

curl -X POST http://localhost:8000/predict/ -F "[email protected]"

🖼 Frontend Setup (React)

cd frontend
npm install
npm run dev

Frontend will run at http://localhost:5173

🔊 How Text-to-Speech Works

Detected sign class name (e.g., "Thank You") is passed to the VitsTTS class.
VITS generates speech using pretrained model kakao-enterprise/vits-ljs.
Output audio is encoded in Base64 WAV and sent back via API.
Frontend decodes and plays the speech audio in the browser.

📄 Resources

✅ Future Scope

Video call feature
Support for dynamic ISL gestures
Multilingual audio output
Mobile version using React Native
Improved UI/UX with gesture history and chat overlay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✋ SigniVision: Real-Time Sign Language Detection with Audio Output

🚀 Features

🛠 Tech Stack

📂 Folder Structure

⚙️ Setup Instructions

🔁 Clone the Repository

🧠 Backend Setup (FastAPI)

1. Create and activate environment

2. Install dependencies

3. Run FastAPI server

🧪 Test API with `/predict/`

🖼 Frontend Setup (React)

🔊 How Text-to-Speech Works

📄 Resources

✅ Future Scope

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
f2		f2
frontend		frontend
model		model
model_backend		model_backend
.gitignore		.gitignore
README.md		README.md

ultimatrix2/SigniVision

Folders and files

Latest commit

History

Repository files navigation

✋ SigniVision: Real-Time Sign Language Detection with Audio Output

🚀 Features

🛠 Tech Stack

📂 Folder Structure

⚙️ Setup Instructions

🔁 Clone the Repository

🧠 Backend Setup (FastAPI)

1. Create and activate environment

2. Install dependencies

3. Run FastAPI server

🧪 Test API with /predict/

🖼 Frontend Setup (React)

🔊 How Text-to-Speech Works

📄 Resources

✅ Future Scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🧪 Test API with `/predict/`

Packages