This project presents a comparative analysis of image processing operations — grayscale conversion and Gaussian blur — implemented across:
- CPU processing
- CUDA GPU implementation
- OpenCV’s optimized functions
The goal was to evaluate execution times, consistency, and performance scaling using Google Colab’s GPU environment.
Results show that CUDA GPU implementation can process images ~1000× faster than CPU for blur operations and ~10× faster than OpenCV-GPU, while maintaining greater consistency across multiple runs.
- Introduction
- Background
- Design
- Evaluation & Results
- Conclusion
- Future Work
- Contribution
- How to Run
- References
Image processing optimization is increasingly critical with the rise of AI and computer vision applications.
With GPUs now widely available in cloud and consumer devices, parallel processing can significantly outperform traditional CPU-based approaches.
This project demonstrates the real-world benefits of GPU acceleration by comparing CPU, GPU, and OpenCV implementations.
- CPU vs GPU: CPUs process tasks sequentially, while GPUs leverage thousands of cores to perform massive parallel computations.
- Operations Studied:
- Grayscale conversion — simple weighted RGB to grayscale transformation.
- Gaussian blur — convolution operation using kernels (computationally intensive).
Three approaches were implemented and compared:
- CPU implementation (serial pixel-by-pixel processing)
- CUDA GPU implementation (16×16 thread blocks, optimized memory access)
- OpenCV implementation (using
cv2.cvtColorandcv2.GaussianBlur)
- Tested with various image sizes
- 10 trials per operation
- Used trimmed mean to remove outliers and ensure consistent benchmarking
- Focused on runtime, consistency, and scalability
- CPU: Inconsistent and slow due to multitasking and resource allocation.
- OpenCV (CPU): More stable, but limited by CPU hardware.
- OpenCV (GPU): Improved performance, but less consistent than CUDA.
- CUDA GPU:
- ~810× faster than CPU
- ~33× faster than OpenCV-CPU
- ~2.75× faster than OpenCV-GPU
- Highest consistency and efficiency
Example: Gaussian Blur runtime:
- CPU ≈ 12.68 seconds
- CUDA GPU ≈ 0.00011 seconds
- GPUs vastly outperform CPUs for parallelizable tasks like Gaussian blur.
- CUDA implementations achieve superior speed and consistency compared to OpenCV.
- As GPU technology advances, the performance gap between CPU and GPU will continue to widen.
- Explore additional image processing algorithms
- Investigate memory optimization and scalability across larger datasets
- Compare performance across different GPU architectures
- Researched convolution and image processing algorithms
- Implemented CPU, GPU (CUDA), and OpenCV solutions
- Benchmarked performance with multiple trials
- Documented results and analysis
This project was implemented in Google Colab with GPU runtime.
- Upload a sample image (e.g.,
sample.png) to your Google Drive. - Mount Google Drive in Colab.
- Run the notebook cells in this order:
- CPU implementation (cells 3–5)
- OpenCV implementation (cell 9)
- Switch to T4 GPU runtime in Colab
- CUDA GPU implementation (cells 6–8)
- Rerun OpenCV implementation (cell 9) for GPU comparison
- Final cell (10) outputs processed images
- Introduction to OpenCV, 2024.
- NVIDIA Corporation. Separable Convolution. Technical documentation, 2007.
- Gloria Bueno Garcia & Oscar Deniz Suarez. Learning Image Processing with OpenCV. Packt Publishing, 2015.
