feat: Add document/page detection and tagging for document images by alishair7071 · Pull Request #1061 · AOSSIE-Org/PictoPy

alishair7071 · 2026-01-22T12:41:14Z

Description

This PR adds document/page detection and tagging capabilities to PictoPy. Document images (scanned pages, PDF pages, book pages) will now receive appropriate tags like "document", "page", "paper", and "text", making them searchable and better organized.

🔧 Changes Made

Added document-related classes to YOLO (backend/app/utils/YOLO.py):
- Added "document", "page", "paper", "text" to class_names list
- These classes are now available for tagging
Implemented document detection function (backend/app/utils/images.py):
- Created image_util_detect_document() function that uses image analysis to detect document characteristics:
  - High edge density (text/lines detection)
  - High contrast (typical of scanned documents)
  - Document-like aspect ratios
  - Horizontal lines (common in document structure)
  - Light backgrounds with dark text
Integrated into tagging pipeline:
- Document detection runs automatically after YOLO object detection
- Document classes are automatically added to images that match document characteristics
- Works seamlessly with existing tagging system
- No breaking changes

Testing

Code follows project style guidelines
No linting errors
Document detection function implemented and tested
Tested with actual document images (scanned pages, PDF pages)
Verified tags are correctly applied to document images

Related Issue

Closes #1053

Checklist

I have read the CONTRIBUTING.md file
My code follows the project's code style
I have commented my code where necessary
My changes generate no new warnings
I have tested my changes locally

Summary by CodeRabbit

New Features
- Expanded detection to recognize document-related objects (document, page, paper, text).
- Added enhanced image analysis to better detect document content and merge those findings into image classification.
Bug Fixes
- Improved handling of unknown detections and updated labeling/color behavior for more accurate visual results.
- Better merging of heuristically detected document classes with existing classifier outputs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add document, page, paper, and text classes to YOLO class_names - Implement image_util_detect_document() function using image analysis - Integrate document detection into image tagging pipeline - Document images now receive appropriate tags (document, page, paper, text)

- Lower detection thresholds for better document image recognition - Improve logging to info level for better debugging - Add fallback condition for edge cases - Handle None classes from YOLO

coderabbitai · 2026-01-22T12:41:43Z

📝 Walkthrough

Walkthrough

Adds four document-related class labels to the YOLO class list and implements an OpenCV-based heuristic document detector that augments YOLO results during image classification to tag document-like images.

Changes

Cohort / File(s)	Summary
YOLO labels `backend/app/utils/YOLO.py`	Added four new class names: `"document"`, `"page"`, `"paper"`, `"text"`. Adjusted boundary check for unknown-class labeling.
Document detection & integration `backend/app/utils/images.py`	Added `image_util_detect_document(image_path: str) -> Optional[List[int]]` using OpenCV (grayscale, edge density, contrast, aspect ratio, line detection). Integrates detection into `image_util_classify_and_face_detect_images()` to merge document class IDs with YOLO results. Added imports for `cv2`, `numpy`, and `Optional`.

Sequence Diagram(s)

sequenceDiagram
    participant U as Upstream caller
    participant ImgUtil as Image Utility
    participant YOLO as YOLO Classifier
    participant DocDet as Document Detector (OpenCV)
    participant Tags as Tag Aggregator / DB

    U->>ImgUtil: classify_and_face_detect(image_path)
    ImgUtil->>YOLO: run_yolo(image_path)
    YOLO-->>ImgUtil: yolo_classes
    ImgUtil->>DocDet: image_util_detect_document(image_path)
    DocDet-->>ImgUtil: doc_class_ids / none
    ImgUtil->>ImgUtil: merge yolo_classes + doc_class_ids
    ImgUtil->>Tags: persist/update tags for image
    Tags-->>ImgUtil: confirmation
    ImgUtil-->>U: classification result (tags)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through pixels, sniffed each frame,
Found pages, papers, gave them a name.
Document tags now dance and gleam,
Organized gardens of every scanned dream. 📄✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding document/page detection and tagging for document images, which aligns with the core functionality added in YOLO.py and images.py.
Linked Issues check	✅ Passed	The pull request fully addresses the objectives from issue `#1053`: adds document/page/paper/text classes to YOLO, implements document detection using heuristic analysis (edge density, contrast, aspect ratio, lines), and integrates detection into the tagging pipeline.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the linked issue `#1053` objectives: YOLO class additions, new document detection function, and integration into the tagging pipeline. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/app/utils/YOLO.py`:
- Around line 86-89: The draw helper currently treats the last entry in
class_names as an "unknown" sentinel by checking class_id >= len(class_names) -
1, which makes the newly added "text" class render as unknown; update the bounds
check in the drawing routine (the code that reads class_id and class_names in
YOLO.py — e.g., the draw/draw_boxes/draw_detections helper) to use class_id >=
len(class_names) (or better: use class_id < len(class_names) to select a name
and fall back to "unknown" otherwise), or alternatively add an explicit
"unknown" sentinel to class_names and adjust the check accordingly so the "text"
class maps correctly.

backend/app/utils/YOLO.py

- Fix issue where last class (text) was treated as unknown - Change from len(class_names) - 1 to len(class_names) - Fixes CodeRabbit suggestion

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/utils/YOLO.py (1)
232-246: Missing bounds check for class_id in YOLO_util_draw_masks.

YOLO_util_draw_masks is called at line 174 before the per-detection bounds check in the loop (lines 177-183). If any class_id in the input array is out of bounds (>= len(colors)), line 239 will raise an IndexError.

Consider adding defensive bounds checking consistent with YOLO_util_draw_detections:
🐛 Proposed fix
 def YOLO_util_draw_masks(
     image: np.ndarray, boxes: np.ndarray, classes: np.ndarray, mask_alpha: float = 0.3
 ) -> np.ndarray:
     mask_img = image.copy()

     # Draw bounding boxes and labels of detections
     for box, class_id in zip(boxes, classes):
-        color = colors[class_id]
+        if class_id >= len(colors):
+            color = colors[-1]
+        else:
+            color = colors[class_id]

         x1, y1, x2, y2 = box.astype(int)

         # Draw fill rectangle in mask image
         cv2.rectangle(mask_img, (x1, y1), (x2, y2), color, -1)

     return cv2.addWeighted(mask_img, mask_alpha, image, 1 - mask_alpha, 0)

alishair7071 added 2 commits January 22, 2026 17:02

fix: Improve document detection thresholds and logging

7004391

- Lower detection thresholds for better document image recognition - Improve logging to info level for better debugging - Add fallback condition for edge cases - Handle None classes from YOLO

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

backend/app/utils/YOLO.py Show resolved Hide resolved

fix: Correct bounds check for class names in draw_detections

80dd640

- Fix issue where last class (text) was treated as unknown - Change from len(class_names) - 1 to len(class_names) - Fixes CodeRabbit suggestion

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

alishair7071 closed this Jan 24, 2026

alishair7071 deleted the feat/add-document-detection-tagging branch January 24, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add document/page detection and tagging for document images#1061

feat: Add document/page detection and tagging for document images#1061
alishair7071 wants to merge 3 commits intoAOSSIE-Org:mainfrom
alishair7071:feat/add-document-detection-tagging

alishair7071 commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alishair7071 commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

🔧 Changes Made

Testing

Related Issue

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alishair7071 commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading