SanketBani is an innovative application designed to bridge the communication gap between hearing and speech-impaired communities and those who can hear and speak. It leverages advanced AI and 3D technology to convert text, images, and real-time gestures into Indian Sign Language (ISL) gestures, fostering seamless communication and understanding.
- Converts any input text into 3D animated Indian Sign Language gestures.
- Allows users to type or paste text and view the corresponding ISL gestures in real time.
- Recognizes text from images and converts it into ISL gestures.
- Uses advanced Optical Character Recognition (OCR) and AI models to extract and process text from images.
- Facilitates real-time conversations between individuals who can and cannot hear or speak.
- Provides smooth interaction using gesture recognition and 3D animated ISL responses.
- Converts sign language gestures from videos into readable text.
- Enables understanding of ISL gestures by those who are unfamiliar with sign language.
- Incorporates a structured dataset of ISL gestures, including alphabets, numbers, common sentences, and frequently used words.
- Expands functionality by combining gestures for new or uncommon phrases.
- React Native for the mobile application.
- Three.js for rendering 3D models.
- Expo Go for development and testing.
- Node.js for API handling.
- APIs for OCR, text-to-sign gesture generation, and other AI functionalities.
- Open CV and Mediapipe used to Extract pose and features from videos.
- TensorFlow and Python used for creating and training neural networks.
The SanketBani project consists of three main folders:
- Contains the React Native application.
- Guides for setup:
cd frontend npm install - For the first build, run:
npm run android
- For subsequent runs:
npm run start
- Contains the Node.js server for handling APIs and other backend functionalities.
- Guides for setup:
cd backend npm install npm run dev
- Contains the Python-based model for video gesture recognition.
- Guides for setup:
cd Recognition Model cd Video Recognition pip install -r requirements.txt python app.py
To ensure proper functioning, create .env files in both the frontend and backend folders with the following structure:
API_URL=https://example.com/api // your backend url
PORT=5000
MONGO_URI=mongodb+srv://user:[email protected]/dbname
JWT_SECRET=your_jwt_secret
EMAIL = [email protected]
PASSWORD = "addf ada sdss wdsd" // this is for nodemailer
CLOUDINARY_CLOUD_NAME = "example"
CLOUDINARY_API_KEY = "example"
CLOUDINARY_API_SECRET = "example"
GOOGLE_APPLICATION_CREDENTIALS = "your json file path"
GEMINI_API_KEY = "example"
ELEVENLABS_API_KEY = "example"
- The user provides input via text, image, or gestures.
- Text is processed to correct grammar and spelling.
- a sequence for ISL gestures is generated if the input matches the dataset.
- the AI combines existing gestures or generates a new sequence for new inputs.
- Processed input is converted into a series of prebuilt 3D animations.
- Animations are displayed in the app’s 3D viewer for users to follow or understand.
To convert videos into sentences, we built a repository of 15 sentences, as stated below. To convert each video, we used opencv and Mediapipe to extract the pose and features from the videos and save it into an .npy file. These files were then trained based on an LTSM neural network, as a classification model. Currently the App only supports 15 sentences, but we are working to add more.
- 'Are you free today'
- 'Can you repeat that please'
- 'Congratulations'
- 'Help me please'
- 'How are you'
- 'I am fine'
- 'I love you'
- 'No'
- 'Please come, Welcome'
- 'Talk slower please'
- 'Thank you'
- 'What are you doing'
- 'What do you do'
- 'What happened'
- 'Yes'
Currently this is done by using an API with a Large Language Model, where the classification task is passed as the context. We are working to make an embedded and lightweight sentence classification solution, that maps our sentences to animation data.
- Education: Helps students interactively learn Indian Sign Language.
- Accessibility: Enables hearing and speech-impaired individuals to communicate with those who do not know sign language.
- Translation: Acts as a translator for conversations between different communities.
- Awareness: Promotes ISL adoption and awareness in workplaces, schools, and public spaces.
- Communication Barriers: Reduces the gap between communities with and without knowledge of ISL.
- Resource Scarcity: Provides prebuilt animations for over 285+ gestures, including alphabets, numbers, and common phrases.
- Ease of Use: User-friendly interface for inputting and viewing ISL gestures in various formats.
- Node.js
- Python (for recognition model)
- Expo Go
- Clone the repository:
git clone https://github.com/Phinix-BI/SanketBani
- Navigate to the project directory:
cd ISL-Recognition - Set up each folder per the Folder Structure and Setup section.
- Support for more languages beyond English.
- Additional gesture datasets and animations.
- Offline functionality for remote areas.
- Integration with voice recognition for enhanced accessibility.
- real-time gesture conversion on video call
We welcome contributions to enhance SanketBani. To contribute:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Submit a pull request with a detailed description of your changes.
SanketBani is licensed under the MIT License. See the LICENSE file for details.
For queries, feature requests, or support:
- Email: [email protected]
- Website: www.sanketbani.com