This repository contains code and datasets used for the task of denoising dirty document images as well as a PPT giving an overview of the process. I did this as a project for my university course.
The dataset used for this project is based on a Kaggle competition. Due to file size restrictions on GitHub, I will upload the three zip files contained within the main dataset zip file individually. The competition link can be found here.
The primary objective of this project is to build a robust convolutional autoencoder model that can effectively denoise images, facilitating better readability and interpretation of the documents.
/denoising-dirty-documents
│
├── /train.zip # Dirty images used for training
├── /train_cleaned.zip # Cleaned images used for training
├── /test.zip # Test images to make predictions
├── DenoisingCAE.ipynb # Jupyter notebook used for the process.
Special thanks to Kaggle for providing the dataset, and to the open-source community for the resources and libraries that made this project possible.