Jarvis: Computer Vision Project

Jarvis is a computer vision application that can detect faces, track features, apply image filters, and provide video streaming capabilities. Built with OpenCV, PyQt5, and Python 3.

Setup and Installation

Prerequisites

For macOS:

brew install opencv
brew install portaudio
brew install [email protected]  # Required for the GUI

For other platforms, install OpenCV, PortAudio, and Python Tkinter using your package manager.

Python Environment Setup

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Install the package in development mode:

pip install -e .

This uses setup.py to install the package with the following benefits:

Installs all dependencies listed in setup.py
Creates the jarvis console command (defined in entry_points)
Includes required data files like cascade XMLs
Allows you to modify the code without reinstalling

Verify OpenCV is properly installed:

python -c "import cv2; print(cv2.__version__)"

Running the Application

# Make sure your virtual environment is activated
source venv/bin/activate

# Run the main application (either method works)
python run_jarvis.py
# OR using the console entry point after installing with setup.py
jarvis

You can also run the individual utility scripts:

# Make sure you're in the virtual environment first
source venv/bin/activate

# Face detection from webcam
python scripts/detect_face_stream.py

# Web streaming of webcam
python scripts/web_serve_stream.py

# Speech recognition (requires Google Cloud credentials)
# Note: You need to set up a Google Cloud account and enable the Speech-to-Text API first
# Then download service account credentials and set this environment variable:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials.json
python scripts/transcribe_mic_stream.py

Running Tests

# Make sure your virtual environment is activated
source venv/bin/activate

# Set your Google Cloud credentials (required for speech recognition tests)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials.json

# Run the test script
python test/test.py

Note: The test script uses Google's Speech-to-Text API and requires credentials. If you don't have Google Cloud credentials, you won't be able to run the tests. This won't affect the main functionality of the application.

Important: Always make sure your virtual environment is activated before running any scripts or the main application. Otherwise, the required dependencies won't be available.

Web Streaming

The main application (run_jarvis.py) provides two web streams:

Raw camera feed: http://localhost:8000/ - Shows the original camera feed without any processing
Processed video feed: http://localhost:8888/ - Displays the feed with applied filters and face detection annotations (when debug mode is enabled)

When running the separate web streaming script (scripts/web_serve_stream.py):

Camera feed: http://localhost:8080/cam.mjpg

You can view these streams in any web browser or embed them in other applications. This dual-stream approach allows you to compare the original and processed videos side-by-side by opening both streams in separate browser windows.

Controls

Keyboard Shortcuts

When the application window is in focus:

Space: Take a screenshot (saves as screenshot.png in project root)
Tab: Start/stop recording a screencast (saves as screencast.avi in project root)
X: Toggle debug view (shows face detection rectangles)
Escape: Quit the application

UI Controls

The application features a control panel with:

Filter selector dropdown
Filter intensity slider
Toggle for displaying the filtered stream
Controls for screenshots and video recording

Features

Face detection and tracking with:
- Multi-stage detection (DNN + cascade)
- Feature tracking (eyes, nose, mouth)
- Temporal smoothing to reduce jitter
- Adaptive frame skipping for better performance
Real-time image processing with filters:
- Edge Detection: Highlights outlines and boundaries in the image
- Sharpen: Enhances details and makes the image crisper
- Blur: Smooths out noise and reduces detail
- Emboss: Creates a 3D relief effect highlighting edges
- Film Emulation Filters:
  - Cross Process: High contrast with altered colours, mimicking cross-processed film
  - Portra: Warm, natural skin tones inspired by Kodak Portra film
  - Provia: Balanced, natural colours inspired by Fuji Provia film
  - Velvia: Vibrant, saturated colours inspired by Fuji Velvia film
Dual streaming over HTTP:
- Raw video stream
- Processed/filtered stream
Media creation:
- Video recording
- Screenshot capture

Troubleshooting

Camera access issues:
- Ensure your webcam is connected and functioning
- Check camera permissions for your terminal app/Python
- If using macOS, you may need to grant permission in System Settings → Privacy & Security → Camera
ImportError: No module named X:
- Make sure you have installed all requirements and activated your virtual environment
- Try installing the specific missing package: pip install X
Port already in use:
- If ports 8000/8888 are already taken, modify the port numbers in the code
Tkinter issues:
- If you get ModuleNotFoundError: No module named '_tkinter', install the Python Tkinter package for your system
- On macOS: brew install [email protected]
- On Ubuntu/Debian: sudo apt-get install python3-tk
Error: externally-managed-environment:
- Make sure you're using a virtual environment: python -m venv venv && source venv/bin/activate
Google Cloud Speech API errors:
- The transcribe_mic_stream.py script requires Google Cloud credentials
- You need to create a Google Cloud account and enable the Speech-to-Text API
- Create a service account key and download the JSON credentials file
- Set the environment variable: export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
- If you don't need speech recognition, you can ignore this step

Project Structure

The project is now organized as a Python package with logical modules:

jarvis/
├── __init__.py          # Package initialization and entry point
├── core/                # Core application functionality
│   ├── __init__.py
│   └── app.py           # Main Jarvis application class
├── face/                # Face detection functionality
│   ├── __init__.py
│   ├── base.py          # Base face detector class
│   ├── cascades/        # Haar cascade XML files
│   ├── detector.py      # Main face detector implementation
│   ├── dnn_detector.py  # Deep neural network detector
│   ├── face_recognition.py  # Face class and recognition functions
│   └── haar_detector.py # Haar cascade detector
├── ui/                  # User interface components
│   ├── __init__.py
│   └── display.py       # PyQt5 UI components
├── utils/               # Utility functions and helpers
│   ├── __init__.py
│   ├── colours.py       # Colour constants
│   ├── filters.py       # Image processing filters
│   ├── helpers.py       # General helper functions
│   └── rects.py         # Rectangle handling utilities
├── video/               # Video handling capabilities
│   ├── __init__.py
│   ├── recorder.py      # Video recording functionality
│   └── streams.py       # Video stream implementations
└── audio/               # Audio processing (for future voice features)
    ├── __init__.py
    └── microphone.py    # Microphone handling

scripts/ - Utility scripts for face detection, web streaming, and speech recognition
run_jarvis.py - Simple script to launch the application

Using the Image Filters

Run the main application: python run_jarvis.py
Select a filter from the dropdown menu or Filters menu
Adjust the intensity using the slider
Toggle "Show Filtered Stream" to view the filtered video in the main window
View both streams simultaneously by opening these URLs in a browser:
- Raw stream: http://localhost:8000/
- Filtered stream: http://localhost:8888/

Face Recognition Training

The application supports face recognition using the FaceRecognizer class. To train the system:

Create a subdirectory for each person under training/data/
Add multiple facial images of each person to their respective directory
Optionally include a name.txt file in each person's directory with their name
The training data is located in the project root under training/data/

Future Development

The project has several planned enhancements for future development:

Facial Identification: Recognise specific individuals beyond just detection
Emotion Detection: Analyse facial expressions and voice patterns to detect emotions
Voice Commands: Control the application using speech via mic.py integration
Advanced Object Detection: Identify and track multiple object types
Remote Control: Web interface for controlling the application remotely

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jarvis: Computer Vision Project

Setup and Installation

Prerequisites

Python Environment Setup

Running the Application

Running Tests

Web Streaming

Controls

Keyboard Shortcuts

UI Controls

Features

Troubleshooting

Project Structure

Using the Image Filters

Face Recognition Training

Future Development

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
jarvis		jarvis
scripts		scripts
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_jarvis.py		run_jarvis.py
setup.py		setup.py

License

cloudartisan/jarvis

Folders and files

Latest commit

History

Repository files navigation

Jarvis: Computer Vision Project

Setup and Installation

Prerequisites

Python Environment Setup

Running the Application

Running Tests

Web Streaming

Controls

Keyboard Shortcuts

UI Controls

Features

Troubleshooting

Project Structure

Using the Image Filters

Face Recognition Training

Future Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages