This project is a Python-based application that identifies and removes duplicate images between two directories (e.g., Google Drive and OneDrive). The program uses a graphical interface to allow users to manually confirm deletion decisions or skip them.
-
Multi-Year and Multi-Month Directory Support:
- Handles directories for multiple years and months dynamically, ensuring a comprehensive comparison process.
-
Preprocessing for Performance:
- Precomputes image pairs and caches resized grayscale versions for faster SSIM calculations.
-
Flexible Threshold:
- Configurable similarity threshold for SSIM-based image comparison, enabling sensitivity adjustments.
-
Modernized GUI:
- Displays both images flagged as duplicates side-by-side with options to keep, delete, or cancel.
- Enhanced user interface with progress bars and dynamic updates.
-
Threaded Preprocessing:
- Uses multithreading for preprocessing image pairs, ensuring the main application remains responsive during calculations.
-
Improved Logging and Error Handling:
- Logs detailed error messages during image loading, resizing, or SSIM calculation.
-
Configurable Deletion Behavior:
- Allows users to select whether to delete duplicates from Google Drive or OneDrive.
- Option to enable or disable confirmation prompts for deletion actions.
- Python 3.8 or later
- Libraries:
ostkintercv2(OpenCV)Pillowskimageshutilrandom
Install required packages:
pip install pillow opencv-python scikit-imageThe CreateDirs script creates year/month subdirectories under ./Google and ./OneDrive and populates them with random images from a TestPhotos directory.
- The program iterates through the specified year and month directories.
- For each pair of images, SSIM is used to calculate their similarity.
- If the similarity exceeds a threshold (default: 0.95), the images are flagged as duplicates.
- The user is presented with both images in a Tkinter window.
- Options include:
- Keep one of the images.
- Delete one of the images.
- Cancel the deletion process.
-
Setup Your Files:
- Place the
TestPhotosfolder with sample images in the root directory. - The script will populate
./Googleand./OneDrivedirectories with random subsets of these images.
- Place the
-
Run the Program:
- Start the script by running:
python DuplicateDelete.py
- The GUI will appear for flagged duplicate images, allowing you to decide on each case.
- Start the script by running:
-
Configuration:
- Modify the following variables in the script to suit your needs:
delete_from_google: Set toTrueto delete images from Google,Falseto delete from OneDrive.confirm_delete: Set toTrueto enable confirmation prompts before deletion.
- Modify the following variables in the script to suit your needs:
.
├── CreateDirs.py # Script to create and populate directories
├── DuplicateDelete.py # Main script for detecting and deleting duplicate images
├── Google/ # Google Drive simulation folder
├── OneDrive/ # OneDrive simulation folder
├── TestPhotos/ # Source folder containing sample images
├── README.md # Project documentation (this file)
-
Folder Structure: After running
CreateDirs.py, the directories are organized as:Google/ ├── 2025/ │ ├── Jan/ │ ├── Feb/ │ └── ... ├── 2024/ └── ... OneDrive/ ├── 2025/ │ ├── Jan/ │ ├── Feb/ │ └── ... ├── 2024/ └── ... -
GUI Window:
- Displays both images flagged as duplicates.
- Options to keep, delete, or cancel.
-
Progress Bar:
- Real-time progress updates during image preprocessing and SSIM calculations.
- Performance:
- Comparing large image sets may take time. Optimize by reducing the number of images or directories.
- Error Handling:
- Provides detailed error logs for missing files or calculation issues.
Feel free to modify and adapt the code to your specific use case!