A real-time hand gesture recognition system that lets you control your computer's mouse cursor, clicks, and scrolling using only your hand — powered by MediaPipe, OpenCV, and PyAutoGUI.
| Gesture | Action |
|---|---|
| ☝️ Index finger up | Move mouse cursor |
| 🤏 Thumb + Index pinch | Left click |
| ✌️ Two fingers (index + middle) | Scroll up / down |
| 🤟 Three fingers (index + middle + ring) | Right click |
| 🖐 Open palm | Pause tracking |
- Real-time webcam processing at 30 FPS
- Multi-hand detection (up to 2 hands)
- Exponential smoothing for jitter-free cursor movement
- Gesture stabilization to prevent accidental clicks
- Settings GUI to customize sensitivity before launch
- FPS counter and gesture label overlay
- Python 3.8+
- A working webcam
# Clone or navigate to the project
cd Screen-Controller
# Create a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/macOS
# Install dependencies
pip install -r requirements.txtThe project requires MediaPipe's hand landmarker model (~5 MB):
Windows (PowerShell):
New-Item -ItemType Directory -Path models -Force
Invoke-WebRequest -Uri "https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.task" -OutFile "models/hand_landmarker.task"Linux/macOS:
mkdir -p models
curl -L -o models/hand_landmarker.task https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.taskpython -m src.mainOptions:
--no-gui Skip the settings GUI, use defaults
--camera N Override camera index (default: 0)
Example:
python -m src.main --no-gui --camera 1- Press
qto quit the application
Screen-Controller/
├── requirements.txt
├── README.md
├── models/
│ └── hand_landmarker.task # Downloaded model
└── src/
├── __init__.py
├── config.py # All tunable constants
├── hand_detector.py # MediaPipe HandLandmarker wrapper
├── gesture_recognizer.py # Gesture classification logic
├── gesture_stabilizer.py # Debounce & cooldown
├── cursor_controller.py # Mouse control with smoothing
├── display_overlay.py # Visual feedback rendering
├── gui_config.py # Tkinter settings GUI
└── main.py # Entry point & main loop
All parameters are in src/config.py:
| Parameter | Default | Description |
|---|---|---|
SMOOTHING_FACTOR |
0.35 | Cursor smoothing (0 = smooth, 1 = raw) |
PINCH_THRESHOLD |
0.045 | Distance for pinch detection |
CLICK_HOLD_FRAMES |
12 | Frames to hold before click fires |
CLICK_COOLDOWN_FRAMES |
20 | Cooldown frames between clicks |
FRAME_MARGIN |
0.12 | Camera frame margin for cursor mapping |
MAX_HANDS |
2 | Maximum hands to detect |
You can also adjust these via the Settings GUI that appears at startup.
- Python 3.8+
- OpenCV — webcam capture & display
- MediaPipe Hands — 21-point hand landmark detection
- PyAutoGUI — cross-platform mouse control
- NumPy — coordinate interpolation & smoothing
- Tkinter — settings GUI (built-in)
- Capture a frame from the webcam and mirror it
- Detect hands using MediaPipe (21 landmarks per hand)
- Classify the gesture based on finger positions:
- Finger "extended" = fingertip y < PIP joint y
- Thumb uses x-axis comparison (accounts for handedness)
- Pinch = thumb-index tip distance below threshold
- Stabilize — only act when the same gesture is held for N frames
- Execute — move cursor / click / scroll via PyAutoGUI
- Display — overlay bounding boxes, landmarks, and labels
This project is open-source. Feel free to use and modify.