feat: Add document/page detection and tagging for document images#1061
feat: Add document/page detection and tagging for document images#1061alishair7071 wants to merge 3 commits intoAOSSIE-Org:mainfrom
Conversation
- Add document, page, paper, and text classes to YOLO class_names - Implement image_util_detect_document() function using image analysis - Integrate document detection into image tagging pipeline - Document images now receive appropriate tags (document, page, paper, text)
- Lower detection thresholds for better document image recognition - Improve logging to info level for better debugging - Add fallback condition for edge cases - Handle None classes from YOLO
📝 WalkthroughWalkthroughAdds four document-related class labels to the YOLO class list and implements an OpenCV-based heuristic document detector that augments YOLO results during image classification to tag document-like images. Changes
Sequence Diagram(s)sequenceDiagram
participant U as Upstream caller
participant ImgUtil as Image Utility
participant YOLO as YOLO Classifier
participant DocDet as Document Detector (OpenCV)
participant Tags as Tag Aggregator / DB
U->>ImgUtil: classify_and_face_detect(image_path)
ImgUtil->>YOLO: run_yolo(image_path)
YOLO-->>ImgUtil: yolo_classes
ImgUtil->>DocDet: image_util_detect_document(image_path)
DocDet-->>ImgUtil: doc_class_ids / none
ImgUtil->>ImgUtil: merge yolo_classes + doc_class_ids
ImgUtil->>Tags: persist/update tags for image
Tags-->>ImgUtil: confirmation
ImgUtil-->>U: classification result (tags)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/utils/YOLO.py`:
- Around line 86-89: The draw helper currently treats the last entry in
class_names as an "unknown" sentinel by checking class_id >= len(class_names) -
1, which makes the newly added "text" class render as unknown; update the bounds
check in the drawing routine (the code that reads class_id and class_names in
YOLO.py — e.g., the draw/draw_boxes/draw_detections helper) to use class_id >=
len(class_names) (or better: use class_id < len(class_names) to select a name
and fall back to "unknown" otherwise), or alternatively add an explicit
"unknown" sentinel to class_names and adjust the check accordingly so the "text"
class maps correctly.
- Fix issue where last class (text) was treated as unknown - Change from len(class_names) - 1 to len(class_names) - Fixes CodeRabbit suggestion
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/utils/YOLO.py (1)
232-246: Missing bounds check forclass_idinYOLO_util_draw_masks.
YOLO_util_draw_masksis called at line 174 before the per-detection bounds check in the loop (lines 177-183). If anyclass_idin the input array is out of bounds (>=len(colors)), line 239 will raise anIndexError.Consider adding defensive bounds checking consistent with
YOLO_util_draw_detections:🐛 Proposed fix
def YOLO_util_draw_masks( image: np.ndarray, boxes: np.ndarray, classes: np.ndarray, mask_alpha: float = 0.3 ) -> np.ndarray: mask_img = image.copy() # Draw bounding boxes and labels of detections for box, class_id in zip(boxes, classes): - color = colors[class_id] + if class_id >= len(colors): + color = colors[-1] + else: + color = colors[class_id] x1, y1, x2, y2 = box.astype(int) # Draw fill rectangle in mask image cv2.rectangle(mask_img, (x1, y1), (x2, y2), color, -1) return cv2.addWeighted(mask_img, mask_alpha, image, 1 - mask_alpha, 0)
Description
This PR adds document/page detection and tagging capabilities to PictoPy. Document images (scanned pages, PDF pages, book pages) will now receive appropriate tags like "document", "page", "paper", and "text", making them searchable and better organized.
🔧 Changes Made
Added document-related classes to YOLO (
backend/app/utils/YOLO.py):class_nameslistImplemented document detection function (
backend/app/utils/images.py):image_util_detect_document()function that uses image analysis to detect document characteristics:Integrated into tagging pipeline:
Testing
Related Issue
Closes #1053
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.