Skip to content

feat: add bulk image preparation utility (rename + resize) for issue #213#267

Open
kousthubha-sky wants to merge 2 commits into
Udayraj123:masterfrom
kousthubha-sky:issue213
Open

feat: add bulk image preparation utility (rename + resize) for issue #213#267
kousthubha-sky wants to merge 2 commits into
Udayraj123:masterfrom
kousthubha-sky:issue213

Conversation

@kousthubha-sky
Copy link
Copy Markdown

Summary

Closes #213

Adds a standalone bulk image preparation utility at scripts/bulk_operations/prepare_images.py to help users clean and optimise their input images before running OMRChecker.

What's included

Bulk Rename (rename subcommand)

  • Removes non-UTF/non-ASCII characters from filenames using unicodedata.normalize("NFD")
  • Replaces spaces and special characters with underscores
  • Handles edge case where cleaning removes all characters (fallback to unnamed_file)
  • Safely handles duplicate filename conflicts after cleaning

Bulk Resize (resize subcommand)

  • Only resizes images that exceed a user-provided file size threshold (bytes)
  • Preserves aspect ratio, never upscales
  • Creates _resized copies — originals are never overwritten
  • Uses cv2.INTER_AREA for high-quality downscaling

Both subcommands support:

  • --recursive flag for subdirectory traversal
  • --dry-run flag to preview all changes without touching any files
  • Per-file status logging ([RENAMED], [RESIZED], [SKIP], [ERROR])
  • Summary statistics at the end

Usage

# Preview renames without making changes
python scripts/bulk_operations/prepare_images.py rename --path ./inputs --dry-run

# Rename for real
python scripts/bulk_operations/prepare_images.py rename --path ./inputs --recursive

# Resize images over 500KB to max 1920x1440 (dry-run first)
python scripts/bulk_operations/prepare_images.py resize --path ./inputs \
  --max-size 500000 --max-width 1920 --max-height 1440 --dry-run

Notes

  • Fully standalone — no imports from OMRChecker core
  • PDF files are not in the supported formats list; only PNG, JPG, JPEG, TIFF are processed
  • Single I/O read per file during resize (image loaded once, reused for dimensions + resizing)

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Add bulk image preparation utility for filename cleanup and optimization

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add standalone bulk image preparation utility with rename and resize operations
• Rename subcommand removes non-UTF/ASCII characters and special characters from filenames
• Resize subcommand downscales images exceeding file size threshold while preserving aspect ratio
• Both operations support recursive directory traversal, dry-run preview mode, and detailed logging
• Update README with usage examples and typical workflow for image preparation
Diagram
flowchart LR
  A["Image Files"] --> B["Rename Operation"]
  A --> C["Resize Operation"]
  B --> D["Clean Filenames"]
  C --> E["Optimized Images"]
  D --> F["Ready for OMRChecker"]
  E --> F
Loading

Grey Divider

File Changes

1. scripts/bulk_operations/__init__.py ✨ Enhancement +1/-0

Initialize bulk operations package

• Create new package initialization file for bulk operations module

scripts/bulk_operations/init.py


2. scripts/bulk_operations/prepare_images.py ✨ Enhancement +585/-0

Create bulk image preparation utility script

• Implement clean_filename() function using Unicode NFD normalization to remove non-ASCII
 characters and special characters
• Implement rename_files() function with recursive directory traversal and dry-run support
• Implement get_image_dimensions() and read_and_get_dimensions() for efficient image I/O
• Implement calculate_resized_dimensions() to preserve aspect ratio during downscaling
• Implement resize_files() function with file size threshold filtering and _resized suffix naming
• Add comprehensive CLI interface with argparse supporting rename and resize subcommands
• Include detailed logging with per-file status messages and summary statistics
• Support for PNG, JPG, JPEG, TIFF formats with error handling and edge cases

scripts/bulk_operations/prepare_images.py


3. Contributors.md 📝 Documentation +1/-0

Add contributor to list

• Add new contributor kousthubha-sky to contributors list

Contributors.md


View more (1)
4. README.md 📝 Documentation +55/-0

Document image preparation utility in README

• Add new "Utility Scripts" section documenting image preparation tool
• Include usage examples for rename and resize subcommands with various options
• Provide typical workflow showing rename, resize, and OMRChecker execution sequence
• Link to script's docstring for detailed help information

README.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (4) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Swallowed exceptions return None 📘 Rule violation ⛯ Reliability
Description
Image read failures are handled by catching broad exceptions and returning None without logging
the underlying error context, making failures hard to diagnose. This violates robust error handling
expectations for actionable error context.
Code

scripts/bulk_operations/prepare_images.py[R238-280]

+def get_image_dimensions(file_path: Path) -> Optional[Tuple[int, int]]:
+    """Get image dimensions (width, height) using OpenCV.
+
+    Returns None if file cannot be read.
+
+    Args:
+        file_path: Path to image file
+
+    Returns:
+        Tuple of (width, height) or None if unreadable
+    """
+    try:
+        image = cv2.imread(str(file_path))
+        if image is None:
+            return None
+        height, width = image.shape[:2]
+        return width, height
+    except Exception:
+        return None
+
+
+def read_and_get_dimensions(
+    file_path: Path,
+) -> Optional[Tuple]:
+    """Read image and return (image_data, width, height).
+
+    Returns None if file cannot be read.
+
+    Args:
+        file_path: Path to image file
+
+    Returns:
+        Tuple of (image, width, height) or None if unreadable
+    """
+    try:
+        image = cv2.imread(str(file_path))
+        if image is None:
+            return None
+        height, width = image.shape[:2]
+        return image, width, height
+    except Exception:
+        return None
+
Evidence
The compliance checklist requires meaningful context and avoiding swallowed exceptions; these
helpers catch Exception and return None silently, losing the reason for failures.

Rule 3: Generic: Robust Error Handling and Edge Case Management
scripts/bulk_operations/prepare_images.py[238-257]
scripts/bulk_operations/prepare_images.py[259-280]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The image-reading helper functions swallow exceptions (`except Exception: return None`) without logging context, which makes debugging failures difficult and violates robust error handling requirements.

## Issue Context
Callers only see `None` and a generic "Unable to read image file" message; the original error cause (permissions, corrupt file, codec issues) is lost.

## Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[238-257]
- scripts/bulk_operations/prepare_images.py[259-280]
- scripts/bulk_operations/prepare_images.py[375-383]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Unguarded getsize crash 🐞 Bug ⛯ Reliability
Description
resize_files() calls os.path.getsize() outside any per-file error handling, so a
missing/unreadable file can crash the entire run instead of being logged and skipped. This
contradicts the tool’s stated behavior of skipping problematic files with error messages.
Code

scripts/bulk_operations/prepare_images.py[R362-364]

+        # Check file size
+        file_size = os.path.getsize(file_path)
+        if file_size <= max_size_bytes:
Evidence
The resize loop performs os.path.getsize(file_path) before entering the try: that wraps
resizing/writing, so exceptions like FileNotFoundError/PermissionError will bubble up and abort
the operation. The module docstring claims error handling should skip problematic files, which this
breaks.

scripts/bulk_operations/prepare_images.py[36-42]
scripts/bulk_operations/prepare_images.py[358-365]
scripts/bulk_operations/prepare_images.py[402-442]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`resize_files()` can crash the whole operation when `os.path.getsize()` raises (deleted file between discovery and processing, permission issues, etc.), because it’s executed outside the per-file try/except.

### Issue Context
The script’s docstring states it should skip problematic files with error messages, but the current structure only catches exceptions during the resize/write stage.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[358-374]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Overwrites existing _resized 🐞 Bug ✓ Correctness
Description
The resize command always writes to <name>_resized<ext> without checking whether that output
already exists. Re-running the tool (or having a pre-existing file with that name) can silently
overwrite data.
Code

scripts/bulk_operations/prepare_images.py[R408-418]

+            # Create output filename with _resized suffix
+            name, ext = os.path.splitext(original_name)
+            resized_name = f"{name}_resized{ext}"
+            resized_path = file_path.parent / resized_name
+
+            original_kb = file_size / 1024
+
+            if not dry_run:
+                # Write resized image
+                cv2.imwrite(str(resized_path), resized_image)
+                new_file_size = os.path.getsize(resized_path)
Evidence
The resized output path is deterministically chosen and written via cv2.imwrite() with no
exists()/unique-name logic. This allows silent overwrites of previously generated resized files or
user-created files matching the naming pattern.

scripts/bulk_operations/prepare_images.py[408-418]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`resize_files()` writes to a deterministic `&lt;name&gt;_resized&lt;ext&gt;` target without checking for pre-existence, enabling silent overwrites.

### Issue Context
This is a bulk operation script; silent overwrites are high-risk and surprising, especially on re-runs.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[408-420]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Logs lack timestamp and user 📘 Rule violation ✧ Quality
Description
The new bulk-operation script logs rename/resize actions without a timestamp or user identifier,
making it hard to reconstruct events for auditing. This violates the audit-trail requirement for
critical file write operations.
Code

scripts/bulk_operations/prepare_images.py[R58-80]

+def log_status(status: str, message: str) -> None:
+    """Print formatted status message to stdout.
+
+    Args:
+        status: Status type (RENAMED, SKIP, ERROR, RESIZED, INFO)
+        message: Message content
+    """
+    print(f"[{status}] {message}")
+
+
+def log_section(title: str) -> None:
+    """Print a section header."""
+    print(f"\n=== {title} ===")
+
+
+def log_summary(processed: int, success: int, skipped: int, errors: int) -> None:
+    """Print operation summary statistics."""
+    log_section("Summary")
+    print(f"Processed: {processed} files")
+    print(f"Success: {success} files")
+    print(f"Skipped: {skipped} files")
+    print(f"Errors: {errors} files")
+
Evidence
Compliance requires audit logs to include at least a timestamp and user ID, but the script prints
plain status lines and summaries without either.

Rule 1: Generic: Comprehensive Audit Trails
scripts/bulk_operations/prepare_images.py[58-66]
scripts/bulk_operations/prepare_images.py[73-80]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The bulk-operation utility prints status lines without timestamp/user context, which does not meet the audit-trail requirement for reconstructing file-change events.

## Issue Context
This script performs filesystem writes (renames and new resized file writes). Audit trails should include a timestamp and (where possible) an operator identifier/context.

## Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[58-80]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Unstructured logs include filenames 📘 Rule violation ⛨ Security
Description
The script emits unstructured stdout logs and includes raw filenames/paths in messages, which can
inadvertently expose sensitive information (e.g., PII embedded in filenames) and is harder to audit.
This violates the secure logging requirement for structured logs and avoiding sensitive data
exposure.
Code

scripts/bulk_operations/prepare_images.py[R179-228]

+    log_section("Bulk Rename Operation")
+    print(f"Path: {directory}")
+    print(f"Recursive: {recursive}")
+    if dry_run:
+        print("Mode: DRY-RUN (preview only, no changes made)")
+
+    image_files = find_image_files(directory, recursive)
+
+    if not image_files:
+        log_status("INFO", "No image files found in specified directory")
+        return
+
+    processed = 0
+    renamed = 0
+    skipped = 0
+    errors = 0
+
+    for file_path in image_files:
+        processed += 1
+        original_name = file_path.name
+        cleaned_name = clean_filename(original_name)
+
+        # Skip if no changes needed
+        if original_name == cleaned_name:
+            log_status("SKIP", f"{original_name} (no changes needed)")
+            skipped += 1
+            continue
+
+        try:
+            new_path = file_path.parent / cleaned_name
+
+            # Handle duplicate filenames (in case cleaned name already exists)
+            if new_path.exists() and new_path != file_path:
+                log_status(
+                    "ERROR",
+                    f"{original_name}: Target filename already exists "
+                    f"({cleaned_name})",
+                )
+                errors += 1
+                continue
+
+            if not dry_run:
+                file_path.rename(new_path)
+
+            log_status("RENAMED", f"{original_name} → {cleaned_name}")
+            renamed += 1
+
+        except OSError as e:
+            log_status("ERROR", f"{original_name}: {str(e)}")
+            errors += 1
Evidence
Compliance requires structured logging and no sensitive data in logs; the script prints free-form
strings including directory and original_name directly to stdout.

Rule 5: Generic: Secure Logging Practices
scripts/bulk_operations/prepare_images.py[179-184]
scripts/bulk_operations/prepare_images.py[198-205]
scripts/bulk_operations/prepare_images.py[226-228]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The script prints unstructured logs and includes raw filenames/paths in output, which may leak sensitive info and is difficult to parse for auditing.

## Issue Context
Secure logging requires structured log format and avoiding sensitive data exposure (filenames often contain personal identifiers).

## Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[58-66]
- scripts/bulk_operations/prepare_images.py[179-228]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Raw exception shown to user 📘 Rule violation ⛨ Security
Description
The CLI prints raw exception messages directly to the user, which can leak internal/system details
(paths, OS-specific error strings) rather than using a generic user-facing error plus protected
detailed logs. This violates secure error handling guidance.
Code

scripts/bulk_operations/prepare_images.py[R579-581]

+    except Exception as e:
+        log_status("ERROR", f"Unexpected error: {str(e)}")
+        return 1
Evidence
Compliance requires user-facing errors to avoid exposing internal details; the script formats errors
with str(e) and prints them to stdout.

Rule 4: Generic: Secure Error Handling
scripts/bulk_operations/prepare_images.py[579-581]
scripts/bulk_operations/prepare_images.py[439-441]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The CLI surfaces raw exception messages via `str(e)` to the end user, which may leak internal/system details.

## Issue Context
Secure error handling expects generic user-facing errors, while detailed diagnostics are logged securely (or behind an explicit debug/verbose mode).

## Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[226-228]
- scripts/bulk_operations/prepare_images.py[439-441]
- scripts/bulk_operations/prepare_images.py[579-581]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
7. Dry-run rename optimistic 🐞 Bug ✧ Quality
Description
In rename --dry-run, the script doesn’t simulate prior renames, so it can report multiple files as
[RENAMED] to the same cleaned target name even though the real run would error on later
collisions. This makes the preview misleading for conflict-heavy directories.
Code

scripts/bulk_operations/prepare_images.py[R210-223]

+            # Handle duplicate filenames (in case cleaned name already exists)
+            if new_path.exists() and new_path != file_path:
+                log_status(
+                    "ERROR",
+                    f"{original_name}: Target filename already exists "
+                    f"({cleaned_name})",
+                )
+                errors += 1
+                continue
+
+            if not dry_run:
+                file_path.rename(new_path)
+
+            log_status("RENAMED", f"{original_name} → {cleaned_name}")
Evidence
Conflict detection is based on the current filesystem state (new_path.exists()). In dry-run mode,
no rename occurs, so exists() never reflects earlier planned renames. Meanwhile the README
promises dry-run previews rename behavior.

scripts/bulk_operations/prepare_images.py[210-224]
README.md[268-270]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Dry-run does not mutate the filesystem, so collision detection (`new_path.exists()`) can miss conflicts that would occur during the real rename sequence.

### Issue Context
Users rely on `--dry-run` for a safe preview of what will happen.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[196-229]
- README.md[264-270]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. 0px resize edge-case 🐞 Bug ✓ Correctness
Description
calculate_resized_dimensions() truncates with int() without clamping to at least 1px, so extreme
aspect ratios combined with small max dimensions can produce (new_width|new_height) == 0, causing
cv2.resize to raise and the file to error.
Code

scripts/bulk_operations/prepare_images.py[R305-315]

+    # Calculate scale factors
+    width_scale = max_width / original_width
+    height_scale = max_height / original_height
+
+    # Use the smaller scale to preserve aspect ratio
+    scale = min(width_scale, height_scale)
+
+    new_width = int(original_width * scale)
+    new_height = int(original_height * scale)
+
+    return new_width, new_height
Evidence
The new dimensions are computed by multiplying by a scale factor and truncating; when one original
dimension is very small and the scale is very small (due to the other dimension), the truncated
result can become 0. That zero is then passed directly to cv2.resize.

scripts/bulk_operations/prepare_images.py[305-315]
scripts/bulk_operations/prepare_images.py[404-406]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`calculate_resized_dimensions()` may produce 0 for one dimension in edge cases due to `int()` truncation, which then breaks `cv2.resize()`.

### Issue Context
This shows up with extreme aspect ratios and small max dimension inputs.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[282-316]
- scripts/bulk_operations/prepare_images.py[402-407]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

9. PDF skip message mismatch 🐞 Bug ✧ Quality
Description
The script docstring claims PDF files are “skipped with a message,” but PDFs are never
discovered/iterated (only image extensions are globbed), so no per-file PDF skip message can occur.
This can confuse users expecting explicit PDF output.
Code

scripts/bulk_operations/prepare_images.py[R25-30]

+SUPPORTED FORMATS:
+  - PNG (.png)
+  - JPEG (.jpg, .jpeg)
+  - TIFF (.tiff, .tif)
+  - PDF files are skipped with a message
+
Evidence
The docstring advertises PDF skip logging, but find_image_files() filters strictly to a tuple of
image extensions and never yields PDFs to be logged as skipped.

scripts/bulk_operations/prepare_images.py[25-30]
scripts/bulk_operations/prepare_images.py[87-89]
scripts/bulk_operations/prepare_images.py[113-116]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Docstring claims PDFs are skipped with a message, but the implementation never discovers PDFs so no message is emitted.

### Issue Context
This is a user-facing utility; accurate help text reduces confusion.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[25-30]
- scripts/bulk_operations/prepare_images.py[87-116]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +238 to +280
def get_image_dimensions(file_path: Path) -> Optional[Tuple[int, int]]:
"""Get image dimensions (width, height) using OpenCV.

Returns None if file cannot be read.

Args:
file_path: Path to image file

Returns:
Tuple of (width, height) or None if unreadable
"""
try:
image = cv2.imread(str(file_path))
if image is None:
return None
height, width = image.shape[:2]
return width, height
except Exception:
return None


def read_and_get_dimensions(
file_path: Path,
) -> Optional[Tuple]:
"""Read image and return (image_data, width, height).

Returns None if file cannot be read.

Args:
file_path: Path to image file

Returns:
Tuple of (image, width, height) or None if unreadable
"""
try:
image = cv2.imread(str(file_path))
if image is None:
return None
height, width = image.shape[:2]
return image, width, height
except Exception:
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Swallowed exceptions return none 📘 Rule violation ⛯ Reliability

Image read failures are handled by catching broad exceptions and returning None without logging
the underlying error context, making failures hard to diagnose. This violates robust error handling
expectations for actionable error context.
Agent Prompt
## Issue description
The image-reading helper functions swallow exceptions (`except Exception: return None`) without logging context, which makes debugging failures difficult and violates robust error handling requirements.

## Issue Context
Callers only see `None` and a generic "Unable to read image file" message; the original error cause (permissions, corrupt file, codec issues) is lost.

## Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[238-257]
- scripts/bulk_operations/prepare_images.py[259-280]
- scripts/bulk_operations/prepare_images.py[375-383]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +362 to +364
# Check file size
file_size = os.path.getsize(file_path)
if file_size <= max_size_bytes:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Unguarded getsize crash 🐞 Bug ⛯ Reliability

resize_files() calls os.path.getsize() outside any per-file error handling, so a
missing/unreadable file can crash the entire run instead of being logged and skipped. This
contradicts the tool’s stated behavior of skipping problematic files with error messages.
Agent Prompt
### Issue description
`resize_files()` can crash the whole operation when `os.path.getsize()` raises (deleted file between discovery and processing, permission issues, etc.), because it’s executed outside the per-file try/except.

### Issue Context
The script’s docstring states it should skip problematic files with error messages, but the current structure only catches exceptions during the resize/write stage.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[358-374]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +408 to +418
# Create output filename with _resized suffix
name, ext = os.path.splitext(original_name)
resized_name = f"{name}_resized{ext}"
resized_path = file_path.parent / resized_name

original_kb = file_size / 1024

if not dry_run:
# Write resized image
cv2.imwrite(str(resized_path), resized_image)
new_file_size = os.path.getsize(resized_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Overwrites existing _resized 🐞 Bug ✓ Correctness

The resize command always writes to <name>_resized<ext> without checking whether that output
already exists. Re-running the tool (or having a pre-existing file with that name) can silently
overwrite data.
Agent Prompt
### Issue description
`resize_files()` writes to a deterministic `<name>_resized<ext>` target without checking for pre-existence, enabling silent overwrites.

### Issue Context
This is a bulk operation script; silent overwrites are high-risk and surprising, especially on re-runs.

### Fix Focus Areas
- scripts/bulk_operations/prepare_images.py[408-420]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Owner

@Udayraj123 Udayraj123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution, added comments.

Comment on lines +149 to +160
# Replace spaces and problematic characters
cleaned = ascii_name.replace(" ", "_")

# Remove any remaining special characters except underscores
cleaned = "".join(c if c.isalnum() or c == "_" else "" for c in cleaned)

# Remove consecutive underscores
while "__" in cleaned:
cleaned = cleaned.replace("__", "_")

# Strip leading/trailing underscores
cleaned = cleaned.strip("_")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think underscores should be kept in the filename as their presence doesn't cause any file/IO errors

Comment on lines +163 to +164
if not cleaned:
cleaned = "unnamed_file"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if multiple filenames become unnamed_file - this leaves a case of overwriting and losing user files?

new_path = file_path.parent / cleaned_name

# Handle duplicate filenames (in case cleaned name already exists)
if new_path.exists() and new_path != file_path:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a sequential renaming, in dry run we may not see the case of post-rename duplicate filenames

# ============================================================================


def get_image_dimensions(file_path: Path) -> Optional[Tuple[int, int]]:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused function

# Create output filename with _resized suffix
name, ext = os.path.splitext(original_name)
resized_name = f"{name}_resized{ext}"
resized_path = file_path.parent / resized_name
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're writing in the same directory as input, re-running the script will lead to

  • potential double resizing of the images
  • potential file_name_resized_resized.jpg
  • User will need to manually delete original images one by one?

Let's move resized images into a dedicated folder which should be ignored when running this script multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhacement][DevX] Add bulk operation scripts for optimising outputs

3 participants