Project SEAL-See All is a comprehensive computer vision application designed to augment the interaction between users and computers using facial detection, hand gesture recognition, and object detection. Implemented on a macOS environment and leveraging a webcam, SEAL enables users to control actions or trigger scripts through specific hand gestures while also offering object detection and text extraction capabilities.
Features Facial Detection and Recognition: Identifies and recognizes faces in real-time video streams. Hand Gesture Recognition: Detects hand gestures including closed fist, open palm, pointing up, thumbs up, thumbs down, and victory sign using MediaPipe. Object Detection: Utilizes YOLO COCO dataset for recognizing a wide range of common objects. Optical Character Recognition (OCR): Extracts text from identified objects with high confidence using PyTesseract.
Installation Ensure you have Python 3.x installed on your macOS. Clone the repository and navigate to the project directory.
git clone cd SEAL-See-All
Dependencies
Install the required Python packages: pip install -r requirements.txt
Make sure to have the models and required files in the following paths: YOLO configuration and weights: /yolo/yolov3.cfg, /yolo/yolov3.weights COCO names: /yolo/coco.names Haar Cascade for face detection: /haar/haarcascade_frontalface_default.xml Gesture recognizer model: gesture_recognizer/gesture_recognizer.task Tesseract OCR: Ensure tesseract is installed and correctly set up on your macOS.
Preparing Facial Encodings Place sample images in the designated folder. Run image_augmentation.py to augment your images for better recognition performance. Use face_encoder.py to generate facial encoding files.
python image_augmentation.py python face_encoder.py
Usage Run the main script to start the application:
python seal.py
Toggle Object Detection: Press the 'o' key to turn on/off object detection. Extract Text: Press the 'a' key to activate OCR on objects detected with over 80% confidence.
Yolo Objects:
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
Adding Gesture Actions
Edit the gesture_recognition_callback function in seal.py to define actions for detected gestures.
To-Do:
Image Enhancement Integration:
I plan to integrate an image enhancement model to refine the images captured for object detection. This enhancement aims to significantly improve the quality of images before they are processed by OCR, thereby enhancing the accuracy of text extraction from complex backgrounds or low-quality images.
Voice Command Activation:
In an effort to broaden the interactivity of SEAL-See All, I am working on incorporating a feature that enables live voice recording with immediate transcription. The transcribed audio will then be fed into ShellGPT, facilitating the conversion of voice commands into actionable code execution within the command line environment. This enhancement is targeted towards creating a more seamless and hands-free user experience, enabling users to execute commands and control the application through voice interactions.
Stay tuned for these exciting updates as we continue to enhance the functionality and user experience of Project SEAL-See All.





