Skip to content

MattPicDev/image-dupe-cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image-dupe-cleaner

Command-line tool to find and remove duplicate images (JPEG/PNG). It computes perceptual hashes and removes duplicates that meet or exceed a configurable similarity threshold.

Build

In PowerShell:

cd d:\repos\image-dupe-cleaner; go build -o image-dupe-cleaner

Usage

.\image-dupe-cleaner.exe -path "C:\path\to\images" -threshold 99 -dry-run

Flags

  • -path: directory to scan (default: .)
  • -threshold: similarity threshold in percent (0-100). Default 99
  • -dry-run: list files that would be deleted without deleting them
  • -workers: number of concurrent workers for hashing (default: number of CPUs)
  • -recursive: whether to scan subdirectories (default: true)

Notes

  • The tool uses a 64-bit perceptual hash (pHash). Similarity is computed from the Hamming distance of hashes. By default the tool requires very high similarity (99%). Adjust -threshold if needed.
  • The tool prefers to keep the image with the larger resolution and/or file size when removing duplicates.

Generated with AI

This project was scaffolded and implemented with the help of an AI assistant (GitHub Copilot) from user-provided prompts. The assistant created the initial repository layout, the Go implementation, and the basic CLI behavior described below.

Original user instructions:

Let's start a new project, with git backing.  This will be a command-line tool, that compiles to an executable, but the choice of programming language is yours.  The tool will examine a single directory and compare all image files, supporting .jpg and .png formats.  After comparison, it will remove duplicate files.  It should be able to handle duplicates that have different resolutions.  It should only consider a duplicate at over a 99% likelihood match, though that percentage should be configurable.  It should have an option to output hte list of files that would be deleted, without actually deleting them.

Follow-up changes requested by the user during development included:

  • Print usage instructions when the executable is run without any arguments.
  • When reporting duplicates, list both the duplicate and the original in the format: duplicate_filename duplicates original_filename.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages