feat: add bulk image preparation utility (rename + resize) for issue #213#267
feat: add bulk image preparation utility (rename + resize) for issue #213#267kousthubha-sky wants to merge 2 commits into
Conversation
…s; update README and contributors
Review Summary by QodoAdd bulk image preparation utility for filename cleanup and optimization
WalkthroughsDescription• Add standalone bulk image preparation utility with rename and resize operations • Rename subcommand removes non-UTF/ASCII characters and special characters from filenames • Resize subcommand downscales images exceeding file size threshold while preserving aspect ratio • Both operations support recursive directory traversal, dry-run preview mode, and detailed logging • Update README with usage examples and typical workflow for image preparation Diagramflowchart LR
A["Image Files"] --> B["Rename Operation"]
A --> C["Resize Operation"]
B --> D["Clean Filenames"]
C --> E["Optimized Images"]
D --> F["Ready for OMRChecker"]
E --> F
File Changes1. scripts/bulk_operations/__init__.py
|
Code Review by Qodo
1. Swallowed exceptions return None
|
| def get_image_dimensions(file_path: Path) -> Optional[Tuple[int, int]]: | ||
| """Get image dimensions (width, height) using OpenCV. | ||
|
|
||
| Returns None if file cannot be read. | ||
|
|
||
| Args: | ||
| file_path: Path to image file | ||
|
|
||
| Returns: | ||
| Tuple of (width, height) or None if unreadable | ||
| """ | ||
| try: | ||
| image = cv2.imread(str(file_path)) | ||
| if image is None: | ||
| return None | ||
| height, width = image.shape[:2] | ||
| return width, height | ||
| except Exception: | ||
| return None | ||
|
|
||
|
|
||
| def read_and_get_dimensions( | ||
| file_path: Path, | ||
| ) -> Optional[Tuple]: | ||
| """Read image and return (image_data, width, height). | ||
|
|
||
| Returns None if file cannot be read. | ||
|
|
||
| Args: | ||
| file_path: Path to image file | ||
|
|
||
| Returns: | ||
| Tuple of (image, width, height) or None if unreadable | ||
| """ | ||
| try: | ||
| image = cv2.imread(str(file_path)) | ||
| if image is None: | ||
| return None | ||
| height, width = image.shape[:2] | ||
| return image, width, height | ||
| except Exception: | ||
| return None | ||
|
|
There was a problem hiding this comment.
1. Swallowed exceptions return none 📘 Rule violation ⛯ Reliability
Image read failures are handled by catching broad exceptions and returning None without logging the underlying error context, making failures hard to diagnose. This violates robust error handling expectations for actionable error context.
Agent Prompt
## Issue description
The image-reading helper functions swallow exceptions (`except Exception: return None`) without logging context, which makes debugging failures difficult and violates robust error handling requirements.
## Issue Context
Callers only see `None` and a generic "Unable to read image file" message; the original error cause (permissions, corrupt file, codec issues) is lost.
## Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[238-257]
- scripts/bulk_operations/prepare_images.py[259-280]
- scripts/bulk_operations/prepare_images.py[375-383]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Check file size | ||
| file_size = os.path.getsize(file_path) | ||
| if file_size <= max_size_bytes: |
There was a problem hiding this comment.
2. Unguarded getsize crash 🐞 Bug ⛯ Reliability
resize_files() calls os.path.getsize() outside any per-file error handling, so a missing/unreadable file can crash the entire run instead of being logged and skipped. This contradicts the tool’s stated behavior of skipping problematic files with error messages.
Agent Prompt
### Issue description
`resize_files()` can crash the whole operation when `os.path.getsize()` raises (deleted file between discovery and processing, permission issues, etc.), because it’s executed outside the per-file try/except.
### Issue Context
The script’s docstring states it should skip problematic files with error messages, but the current structure only catches exceptions during the resize/write stage.
### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[358-374]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Create output filename with _resized suffix | ||
| name, ext = os.path.splitext(original_name) | ||
| resized_name = f"{name}_resized{ext}" | ||
| resized_path = file_path.parent / resized_name | ||
|
|
||
| original_kb = file_size / 1024 | ||
|
|
||
| if not dry_run: | ||
| # Write resized image | ||
| cv2.imwrite(str(resized_path), resized_image) | ||
| new_file_size = os.path.getsize(resized_path) |
There was a problem hiding this comment.
3. Overwrites existing _resized 🐞 Bug ✓ Correctness
The resize command always writes to <name>_resized<ext> without checking whether that output already exists. Re-running the tool (or having a pre-existing file with that name) can silently overwrite data.
Agent Prompt
### Issue description
`resize_files()` writes to a deterministic `<name>_resized<ext>` target without checking for pre-existence, enabling silent overwrites.
### Issue Context
This is a bulk operation script; silent overwrites are high-risk and surprising, especially on re-runs.
### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[408-420]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Udayraj123
left a comment
There was a problem hiding this comment.
thanks for the contribution, added comments.
| # Replace spaces and problematic characters | ||
| cleaned = ascii_name.replace(" ", "_") | ||
|
|
||
| # Remove any remaining special characters except underscores | ||
| cleaned = "".join(c if c.isalnum() or c == "_" else "" for c in cleaned) | ||
|
|
||
| # Remove consecutive underscores | ||
| while "__" in cleaned: | ||
| cleaned = cleaned.replace("__", "_") | ||
|
|
||
| # Strip leading/trailing underscores | ||
| cleaned = cleaned.strip("_") |
There was a problem hiding this comment.
I think underscores should be kept in the filename as their presence doesn't cause any file/IO errors
| if not cleaned: | ||
| cleaned = "unnamed_file" |
There was a problem hiding this comment.
if multiple filenames become unnamed_file - this leaves a case of overwriting and losing user files?
| new_path = file_path.parent / cleaned_name | ||
|
|
||
| # Handle duplicate filenames (in case cleaned name already exists) | ||
| if new_path.exists() and new_path != file_path: |
There was a problem hiding this comment.
Since it's a sequential renaming, in dry run we may not see the case of post-rename duplicate filenames
| # ============================================================================ | ||
|
|
||
|
|
||
| def get_image_dimensions(file_path: Path) -> Optional[Tuple[int, int]]: |
| # Create output filename with _resized suffix | ||
| name, ext = os.path.splitext(original_name) | ||
| resized_name = f"{name}_resized{ext}" | ||
| resized_path = file_path.parent / resized_name |
There was a problem hiding this comment.
If we're writing in the same directory as input, re-running the script will lead to
- potential double resizing of the images
- potential file_name_resized_resized.jpg
- User will need to manually delete original images one by one?
Let's move resized images into a dedicated folder which should be ignored when running this script multiple times.
Summary
Closes #213
Adds a standalone bulk image preparation utility at
scripts/bulk_operations/prepare_images.pyto help users clean and optimise their input images before running OMRChecker.What's included
Bulk Rename (
renamesubcommand)unicodedata.normalize("NFD")unnamed_file)Bulk Resize (
resizesubcommand)_resizedcopies — originals are never overwrittencv2.INTER_AREAfor high-quality downscalingBoth subcommands support:
--recursiveflag for subdirectory traversal--dry-runflag to preview all changes without touching any files[RENAMED],[RESIZED],[SKIP],[ERROR])Usage
Notes