Skip to content

brucewayne7777/scan_py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 Document Scanner with OpenCV

A Python-based document scanner that automatically detects document boundaries and creates a top-down perspective view of documents. This project is an implementation based on Adrian Rosebrock's tutorial from PyImageSearch, created as a learning exercise to understand computer vision concepts using OpenCV.

🎯 Project Purpose

This project serves as a hands-on learning experience for:

  • Computer Vision fundamentals with OpenCV
  • Image processing techniques (edge detection, contour finding, perspective transforms)
  • Document processing automation
  • Python image manipulation

✨ Features

  • Automatic document detection using edge detection and contour analysis
  • Perspective correction to get a top-down view of documents
  • Image enhancement with adaptive thresholding for better readability
  • Real-time visualization of each processing step
  • Support for various image formats (JPG, PNG, etc.)

🛠️ Installation

Prerequisites

  • Python 3.6 or higher
  • pip (Python package installer)

Dependencies

Install the required packages:

pip install opencv-python
pip install scikit-image
pip install imutils
pip install numpy

Or install all dependencies at once:

pip install -r requirements.txt

📁 Project Structure

scan_py/
├── scan.py          # Main document scanning script
├── transform.py     # Perspective transform utilities
├── images/          # Sample images directory
│   └── receipt.jpg  # Example document image
└── README.md        # This file

🚀 Usage

Basic Usage

Run the scanner with an image file:

python scan.py -i path/to/your/image.jpg

Examples

  1. Scan a receipt in the images folder:

    python scan.py -i images/receipt.jpg
  2. Scan any image from your computer:

    python scan.py -i "C:/Users/YourName/Desktop/document.png"
  3. Scan an image in the current directory:

    python scan.py -i my_document.jpg

🔍 How It Works

The document scanner follows these steps:

Step 1: Edge Detection

  • Converts the image to grayscale
  • Applies Gaussian blur to reduce noise
  • Uses Canny edge detection to find document boundaries

Step 2: Contour Detection

  • Finds all contours in the edge-detected image
  • Sorts contours by area (largest first)
  • Identifies the contour with exactly 4 points (document corners)

Step 3: Perspective Transform

  • Applies a four-point perspective transform
  • Creates a top-down view of the document
  • Enhances the image with adaptive thresholding

🎓 Learning Objectives

This project demonstrates key computer vision concepts:

  • Image Preprocessing: Grayscale conversion, blurring, edge detection
  • Contour Analysis: Finding and filtering contours based on properties
  • Geometric Transformations: Perspective correction using homography
  • Image Enhancement: Adaptive thresholding for better contrast
  • OpenCV Integration: Working with cv2 library for image processing

⚠️ Troubleshooting

Common Issues

  1. "screenCnt is not defined" error:

    • The script couldn't find a document with 4 clear corners
    • Try with a clearer image or better lighting
    • Ensure the document is fully visible in the frame
  2. No contours detected:

    • Check if the image has sufficient contrast
    • Try adjusting the Canny edge detection parameters
    • Ensure the document edges are clearly defined
  3. Poor quality results:

    • Use images with good lighting
    • Ensure the document is flat and not wrinkled
    • Try different angles if the document isn't detected

🔧 Customization

You can modify the script to adjust:

  • Edge detection sensitivity (Canny thresholds)
  • Blur intensity (Gaussian blur kernel size)
  • Contour approximation accuracy
  • Thresholding parameters for final enhancement

📚 Resources for Learning

🤝 Contributing

This is a learning project, but suggestions and improvements are welcome! Feel free to:

  • Report issues
  • Suggest improvements
  • Share your own implementations

📄 License

This project is open source and available under the MIT License.


Happy Learning! 🎉

This project is an implementation based on Adrian Rosebrock's tutorial from PyImageSearch, created to explore computer vision concepts and document processing techniques using OpenCV.

🙏 Acknowledgments

  • Adrian Rosebrock - Original tutorial author and creator of PyImageSearch
  • PyImageSearch - For providing excellent computer vision tutorials and resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages