A Python-based document scanner that automatically detects document boundaries and creates a top-down perspective view of documents. This project is an implementation based on Adrian Rosebrock's tutorial from PyImageSearch, created as a learning exercise to understand computer vision concepts using OpenCV.
This project serves as a hands-on learning experience for:
- Computer Vision fundamentals with OpenCV
- Image processing techniques (edge detection, contour finding, perspective transforms)
- Document processing automation
- Python image manipulation
- Automatic document detection using edge detection and contour analysis
- Perspective correction to get a top-down view of documents
- Image enhancement with adaptive thresholding for better readability
- Real-time visualization of each processing step
- Support for various image formats (JPG, PNG, etc.)
- Python 3.6 or higher
- pip (Python package installer)
Install the required packages:
pip install opencv-python
pip install scikit-image
pip install imutils
pip install numpyOr install all dependencies at once:
pip install -r requirements.txtscan_py/
├── scan.py # Main document scanning script
├── transform.py # Perspective transform utilities
├── images/ # Sample images directory
│ └── receipt.jpg # Example document image
└── README.md # This file
Run the scanner with an image file:
python scan.py -i path/to/your/image.jpg-
Scan a receipt in the images folder:
python scan.py -i images/receipt.jpg
-
Scan any image from your computer:
python scan.py -i "C:/Users/YourName/Desktop/document.png" -
Scan an image in the current directory:
python scan.py -i my_document.jpg
The document scanner follows these steps:
- Converts the image to grayscale
- Applies Gaussian blur to reduce noise
- Uses Canny edge detection to find document boundaries
- Finds all contours in the edge-detected image
- Sorts contours by area (largest first)
- Identifies the contour with exactly 4 points (document corners)
- Applies a four-point perspective transform
- Creates a top-down view of the document
- Enhances the image with adaptive thresholding
This project demonstrates key computer vision concepts:
- Image Preprocessing: Grayscale conversion, blurring, edge detection
- Contour Analysis: Finding and filtering contours based on properties
- Geometric Transformations: Perspective correction using homography
- Image Enhancement: Adaptive thresholding for better contrast
- OpenCV Integration: Working with cv2 library for image processing
-
"screenCnt is not defined" error:
- The script couldn't find a document with 4 clear corners
- Try with a clearer image or better lighting
- Ensure the document is fully visible in the frame
-
No contours detected:
- Check if the image has sufficient contrast
- Try adjusting the Canny edge detection parameters
- Ensure the document edges are clearly defined
-
Poor quality results:
- Use images with good lighting
- Ensure the document is flat and not wrinkled
- Try different angles if the document isn't detected
You can modify the script to adjust:
- Edge detection sensitivity (Canny thresholds)
- Blur intensity (Gaussian blur kernel size)
- Contour approximation accuracy
- Thresholding parameters for final enhancement
- Original Tutorial by Adrian Rosebrock - The source tutorial this implementation is based on
- OpenCV Documentation
- Computer Vision Tutorials
- Image Processing Concepts
This is a learning project, but suggestions and improvements are welcome! Feel free to:
- Report issues
- Suggest improvements
- Share your own implementations
This project is open source and available under the MIT License.
Happy Learning! 🎉
This project is an implementation based on Adrian Rosebrock's tutorial from PyImageSearch, created to explore computer vision concepts and document processing techniques using OpenCV.
- Adrian Rosebrock - Original tutorial author and creator of PyImageSearch
- PyImageSearch - For providing excellent computer vision tutorials and resources