Skip to content

Making 'Genius Scan' for the web to learn computer vision on video by streaming frames with real-time document tracking (robust border detection and reliable real-time shearing transformations).

Notifications You must be signed in to change notification settings

vats98754/document-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Document Scanner

A comprehensive document scanner implementation featuring from-scratch computer vision algorithms and real-time camera-based corner detection with colored overlays.

🌟 Features

Real-Time Camera Detection

  • Live Corner Detection: Real-time Harris corner detection with colored circular overlays
  • Custom Computer Vision: 100% from-scratch implementations without external CV libraries
  • Interactive Web Interface: Browser-based camera integration with live processing
  • Document Capture: One-click document capture with processed results

Python Analysis Engine

  • Advanced Document Detection: Detects paper corners within images using multiple edge detection methods
  • Perspective Correction: Transforms quadrilateral documents into rectangular scans
  • Hyperparameter Tuning: Comprehensive hyperparameter optimization with 1,024 combinations
  • Quick Testing: Fast hyperparameter testing with 48 combinations
  • Visualization: Detailed analysis and visualization of results
  • Modular Design: Clean separation of concerns with dedicated modules

From-Scratch Computer Vision Implementations

  • 2D Convolution: Custom convolution operations with kernel support
  • Sobel Edge Detection: Manual implementation of Sobel operators (Gx, Gy)
  • Gaussian Blur: Custom Gaussian kernel generation and application
  • Harris Corner Detection: Complete Harris corner detector with non-maximum suppression
  • Real-Time Processing: Optimized algorithms for live video processing

πŸ”¬ From-Scratch Computer Vision Algorithms

This project implements all computer vision algorithms from scratch without relying on external libraries like OpenCV. Here's how our custom implementations work:

2D Convolution Engine

// Custom 2D convolution with kernel support
static convolve2D(imageData: ImageData, kernel: number[][], stride: number = 1): ImageData {
    // Applies convolution operation using nested loops
    // Supports arbitrary kernel sizes and stride values
    // Handles border conditions with zero-padding
}

Key Features:

  • Pure JavaScript implementation for web compatibility
  • Support for arbitrary kernel sizes (3x3, 5x5, etc.)
  • Optimized memory access patterns
  • Real-time performance for live video processing

Sobel Edge Detection

// Sobel kernels for edge detection
static getSobelKernels(): { x: number[][], y: number[][] } {
    return {
        x: [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],  // Horizontal edges
        y: [[-1, -2, -1], [0, 0, 0], [1, 2, 1]]   // Vertical edges
    };
}

Implementation Details:

  • Separate X and Y gradient computation
  • Edge magnitude calculation: sqrt(GxΒ² + GyΒ²)
  • Gradient direction for advanced edge analysis
  • Real-time edge visualization as green overlay dots

Gaussian Blur & Kernel Generation

// Dynamic Gaussian kernel generation
static generateGaussianKernel(size: number, sigma: number): number[][] {
    // Mathematical kernel generation: G(x,y) = (1/2πσ²) * e^(-(xΒ²+yΒ²)/2σ²)
    // Automatic normalization for proper convolution
    // Configurable sigma for blur strength control
}

Features:

  • Mathematical precision in kernel computation
  • Configurable blur strength via sigma parameter
  • Automatic kernel normalization
  • Support for various kernel sizes (3x3, 5x5, 7x7)

Harris Corner Detection

// Complete Harris corner detector implementation
static harrisCornerDetection(imageData: ImageData, threshold: number = 0.01): Corner[] {
    // 1. Compute image gradients using Sobel operators
    // 2. Calculate structure tensor components (Ixx, Iyy, Ixy)
    // 3. Apply Gaussian weighting to structure tensor
    // 4. Compute Harris response: R = det(M) - k*trace(M)Β²
    // 5. Apply threshold and non-maximum suppression
}

Algorithm Steps:

  1. Gradient Computation: Custom Sobel operators for Ix, Iy
  2. Structure Tensor: Second-moment matrix calculation
  3. Gaussian Weighting: Spatial weighting of gradients
  4. Harris Response: Mathematical corner strength measure
  5. Non-Maximum Suppression: Remove redundant corner detections
  6. Color Coding: Visual representation with colored overlays

Real-Time Processing Pipeline

private detectCorners(): void {
    // 1. Capture video frame to hidden canvas
    const imageData = this.hiddenCtx.getImageData(0, 0, width, height);
    
    // 2. Convert to grayscale (custom implementation)
    const grayImageData = CVUtils.toGrayscale(imageData);
    
    // 3. Apply Gaussian blur (noise reduction)
    const blurredImageData = CVUtils.gaussianBlur(grayImageData, 5, 1.0);
    
    // 4. Detect corners using Harris detector
    const corners = CVUtils.harrisCornerDetection(blurredImageData, 0.01);
    
    // 5. Draw colored overlays on live video
    this.drawColoredCorners(corners);
}

Performance Optimizations

  • Memory Management: Efficient ImageData manipulation
  • Kernel Caching: Pre-computed Gaussian kernels for common sizes
  • Spatial Optimization: Smart pixel sampling for real-time performance
  • Frame Rate Control: Adaptive processing based on device capabilities

πŸ—οΈ Project Structure

document-scanner/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ document_scanner.py      # Core document scanning functions
β”‚   β”œβ”€β”€ hyperparameter_tuning.py # Hyperparameter optimization
β”‚   β”œβ”€β”€ analysis.py              # Result analysis and visualization
β”‚   β”œβ”€β”€ sobel_kernels.py         # Custom Sobel kernel implementations
β”‚   β”œβ”€β”€ script.ts                # Real-time web-based corner detection
β”‚   └── server.ts                # Development server
β”œβ”€β”€ test_scanner.py              # Test suite and examples
β”œβ”€β”€ computer-vision.ipynb        # Jupyter notebook with experiments
β”œβ”€β”€ index.html                   # Web interface for camera detection
β”œβ”€β”€ styles.css                   # Web styling
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ package.json                 # Node.js dependencies
β”œβ”€β”€ tsconfig.json               # TypeScript configuration
β”œβ”€β”€ .gitignore                  # Git ignore rules
└── README.md                   # This file

πŸš€ Installation & Setup

Python Environment

  1. Clone the repository:
git clone <repository-url>
cd document-scanner
  1. Install Python dependencies:
pip install -r requirements.txt

Web Interface Setup

  1. Install Node.js dependencies:
npm install
  1. Compile TypeScript:
npx tsc
  1. Start local server:
python3 -m http.server 8000
  1. Open browser and navigate to:
http://localhost:8000

πŸ“± Usage

Real-Time Camera Detection

  1. Start the Web Interface:

    • Open index.html in a web browser (or use the local server)
    • Click "πŸ“· Start Camera" to enable webcam access
  2. Live Corner Detection:

    • Position a document or object in front of the camera
    • Observe real-time colored corner detection overlays:
      • πŸ”΄ Red circles: Primary corners
      • 🟒 Green circles: Secondary corners
      • πŸ”΅ Blue circles: Additional feature points
      • 🟑 Yellow circles: Edge intersections
    • Corner response strength shown as circle radius
    • Live edge detection shown as green dots
  3. Capture Documents:

    • Click "πŸ“Έ Capture Document" to save current frame
    • Images saved with detected features highlighted
    • Download captured documents for further processing

Python Document Analysis

from src.document_scanner import test_scanner

# Test document scanner on an image
image_path = "path/to/your/document.jpg"
original, corners_viz, scanned = test_scanner(image_path)

Hyperparameter Tuning

from src.hyperparameter_tuning import hyperparameter_tuning, quick_hyperparameter_test

# Quick test (48 combinations)
results, best = quick_hyperparameter_tuning("path/to/document.jpg")

# Full hyperparameter tuning (1,024 combinations)
results, best = hyperparameter_tuning("path/to/document.jpg")

Analysis and Visualization

from src.analysis import analyze_results, visualize_top_results

# Analyze results
sorted_results = analyze_results("hyperparameter_results")

# Visualize top performing combinations
visualize_top_results("hyperparameter_results", top_n=6)

Running the Test Suite

python test_scanner.py

Hyperparameter Tuning

The system tests the following parameters:

  • Blur Kernel: [3, 5, 7, 9] - Gaussian blur kernel sizes
  • Canny Low: [30, 50, 70, 100] - Lower Canny threshold
  • Canny High: [100, 150, 200, 250] - Upper Canny threshold
  • Epsilon Factor: [0.01, 0.02, 0.03, 0.05] - Contour approximation factor
  • Min Area: [500, 1000, 2000, 5000] - Minimum area threshold

Results Organization

Results are saved in organized directory structures:

hyperparameter_results/
β”œβ”€β”€ blur5_canny50-150_eps0.02_area1000/
β”‚   β”œβ”€β”€ original.jpg
β”‚   β”œβ”€β”€ edges.jpg
β”‚   β”œβ”€β”€ contours.jpg
β”‚   β”œβ”€β”€ blurred.jpg
β”‚   └── results.json
β”œβ”€β”€ hyperparameter_summary.json
β”œβ”€β”€ parameter_effects.png
└── top_results_visualization.png

Key Functions

Core Document Scanner

  • document_scanner(): Main scanning function with perspective correction
  • find_edges(): Simple edge detection
  • order_corners(): Orders corner points correctly
  • test_scanner(): Test function with visualization

Hyperparameter Optimization

  • hyperparameter_tuning(): Full hyperparameter optimization
  • quick_hyperparameter_test(): Fast testing with subset of parameters
  • document_scanner_with_hyperparams(): Configurable scanner function

Analysis and Visualization

  • analyze_results(): Comprehensive result analysis
  • visualize_top_results(): Visualization of best performing combinations
  • visualize_quick_results(): Quick test result visualization
  • compare_hyperparameter_effects(): Detailed parameter effect analysis

Git Ignore Configuration

The .gitignore file is configured to:

  • Ignore all hyperparameter result directories
  • Keep only summary files: hyperparameter_summary.json, parameter_effects.png, *_visualization.png
  • Standard Python, Jupyter, and IDE ignore patterns

Dependencies

  • OpenCV (cv2) - Computer vision operations
  • NumPy - Numerical computations
  • Matplotlib - Plotting and visualization
  • itertools - Parameter combination generation
  • json - Result serialization
  • os - File system operations

License

This project is open source and available under the MIT License.

  1. Download your scanned documents from the results section

Requirements

  • Node.js (v14 or higher)
  • A modern web browser with webcam support
  • Camera permissions enabled

Technologies Used

  • Frontend: HTML5, CSS3, JavaScript (ES6+)
  • Computer Vision: OpenCV.js for document detection
  • Backend: Node.js with Express.js
  • Camera API: WebRTC getUserMedia API

Browser Compatibility

  • Chrome 60+
  • Firefox 55+
  • Safari 11+
  • Edge 79+

Tips for Best Results

  • Ensure good lighting
  • Use a contrasting background (dark document on light surface or vice versa)
  • Keep the document flat and unfolded
  • Maintain steady hands during capture
  • Position the entire document within the camera view

Development

To run in development mode:

npm run dev

The app will be available at http://localhost:3000

License

MIT License - feel free to use and modify as needed!

About

Making 'Genius Scan' for the web to learn computer vision on video by streaming frames with real-time document tracking (robust border detection and reliable real-time shearing transformations).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages