Skip to content

feat: Add document/page detection and tagging for document images#1061

Closed
alishair7071 wants to merge 3 commits intoAOSSIE-Org:mainfrom
alishair7071:feat/add-document-detection-tagging
Closed

feat: Add document/page detection and tagging for document images#1061
alishair7071 wants to merge 3 commits intoAOSSIE-Org:mainfrom
alishair7071:feat/add-document-detection-tagging

Conversation

@alishair7071
Copy link

@alishair7071 alishair7071 commented Jan 22, 2026

Description

This PR adds document/page detection and tagging capabilities to PictoPy. Document images (scanned pages, PDF pages, book pages) will now receive appropriate tags like "document", "page", "paper", and "text", making them searchable and better organized.

🔧 Changes Made

  1. Added document-related classes to YOLO (backend/app/utils/YOLO.py):

    • Added "document", "page", "paper", "text" to class_names list
    • These classes are now available for tagging
  2. Implemented document detection function (backend/app/utils/images.py):

    • Created image_util_detect_document() function that uses image analysis to detect document characteristics:
      • High edge density (text/lines detection)
      • High contrast (typical of scanned documents)
      • Document-like aspect ratios
      • Horizontal lines (common in document structure)
      • Light backgrounds with dark text
  3. Integrated into tagging pipeline:

    • Document detection runs automatically after YOLO object detection
    • Document classes are automatically added to images that match document characteristics
    • Works seamlessly with existing tagging system
    • No breaking changes

Testing

  • Code follows project style guidelines
  • No linting errors
  • Document detection function implemented and tested
  • Tested with actual document images (scanned pages, PDF pages)
  • Verified tags are correctly applied to document images

Related Issue

Closes #1053

Checklist

  • I have read the CONTRIBUTING.md file
  • My code follows the project's code style
  • I have commented my code where necessary
  • My changes generate no new warnings
  • I have tested my changes locally
image

Summary by CodeRabbit

  • New Features

    • Expanded detection to recognize document-related objects (document, page, paper, text).
    • Added enhanced image analysis to better detect document content and merge those findings into image classification.
  • Bug Fixes

    • Improved handling of unknown detections and updated labeling/color behavior for more accurate visual results.
    • Better merging of heuristically detected document classes with existing classifier outputs.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add document, page, paper, and text classes to YOLO class_names
- Implement image_util_detect_document() function using image analysis
- Integrate document detection into image tagging pipeline
- Document images now receive appropriate tags (document, page, paper, text)
- Lower detection thresholds for better document image recognition
- Improve logging to info level for better debugging
- Add fallback condition for edge cases
- Handle None classes from YOLO
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Adds four document-related class labels to the YOLO class list and implements an OpenCV-based heuristic document detector that augments YOLO results during image classification to tag document-like images.

Changes

Cohort / File(s) Summary
YOLO labels
backend/app/utils/YOLO.py
Added four new class names: "document", "page", "paper", "text". Adjusted boundary check for unknown-class labeling.
Document detection & integration
backend/app/utils/images.py
Added image_util_detect_document(image_path: str) -> Optional[List[int]] using OpenCV (grayscale, edge density, contrast, aspect ratio, line detection). Integrates detection into image_util_classify_and_face_detect_images() to merge document class IDs with YOLO results. Added imports for cv2, numpy, and Optional.

Sequence Diagram(s)

sequenceDiagram
    participant U as Upstream caller
    participant ImgUtil as Image Utility
    participant YOLO as YOLO Classifier
    participant DocDet as Document Detector (OpenCV)
    participant Tags as Tag Aggregator / DB

    U->>ImgUtil: classify_and_face_detect(image_path)
    ImgUtil->>YOLO: run_yolo(image_path)
    YOLO-->>ImgUtil: yolo_classes
    ImgUtil->>DocDet: image_util_detect_document(image_path)
    DocDet-->>ImgUtil: doc_class_ids / none
    ImgUtil->>ImgUtil: merge yolo_classes + doc_class_ids
    ImgUtil->>Tags: persist/update tags for image
    Tags-->>ImgUtil: confirmation
    ImgUtil-->>U: classification result (tags)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through pixels, sniffed each frame,
Found pages, papers, gave them a name.
Document tags now dance and gleam,
Organized gardens of every scanned dream. 📄✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding document/page detection and tagging for document images, which aligns with the core functionality added in YOLO.py and images.py.
Linked Issues check ✅ Passed The pull request fully addresses the objectives from issue #1053: adds document/page/paper/text classes to YOLO, implements document detection using heuristic analysis (edge density, contrast, aspect ratio, lines), and integrates detection into the tagging pipeline.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue #1053 objectives: YOLO class additions, new document detection function, and integration into the tagging pipeline. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/app/utils/YOLO.py`:
- Around line 86-89: The draw helper currently treats the last entry in
class_names as an "unknown" sentinel by checking class_id >= len(class_names) -
1, which makes the newly added "text" class render as unknown; update the bounds
check in the drawing routine (the code that reads class_id and class_names in
YOLO.py — e.g., the draw/draw_boxes/draw_detections helper) to use class_id >=
len(class_names) (or better: use class_id < len(class_names) to select a name
and fall back to "unknown" otherwise), or alternatively add an explicit
"unknown" sentinel to class_names and adjust the check accordingly so the "text"
class maps correctly.

- Fix issue where last class (text) was treated as unknown
- Change from len(class_names) - 1 to len(class_names)
- Fixes CodeRabbit suggestion
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/utils/YOLO.py (1)

232-246: Missing bounds check for class_id in YOLO_util_draw_masks.

YOLO_util_draw_masks is called at line 174 before the per-detection bounds check in the loop (lines 177-183). If any class_id in the input array is out of bounds (>= len(colors)), line 239 will raise an IndexError.

Consider adding defensive bounds checking consistent with YOLO_util_draw_detections:

🐛 Proposed fix
 def YOLO_util_draw_masks(
     image: np.ndarray, boxes: np.ndarray, classes: np.ndarray, mask_alpha: float = 0.3
 ) -> np.ndarray:
     mask_img = image.copy()

     # Draw bounding boxes and labels of detections
     for box, class_id in zip(boxes, classes):
-        color = colors[class_id]
+        if class_id >= len(colors):
+            color = colors[-1]
+        else:
+            color = colors[class_id]

         x1, y1, x2, y2 = box.astype(int)

         # Draw fill rectangle in mask image
         cv2.rectangle(mask_img, (x1, y1), (x2, y2), color, -1)

     return cv2.addWeighted(mask_img, mask_alpha, image, 1 - mask_alpha, 0)

@alishair7071 alishair7071 deleted the feat/add-document-detection-tagging branch January 24, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feat: Add document/page detection and tagging for document images

1 participant