-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When using yolox as the Hi-res model for loading outputs / annotations, it annotates with bbox dimensions but missing text for complex images, resulting in text being set to None. But later on printing or accessing the Image element (__str__ method), it should be returning string instead of None.
To Reproduce
from langchain_community.document_loaders.image import UnstructuredImageLoader
from unstructured_inference.models.base import DEFAULT_MODEL
import os
img_loader = UnstructuredImageLoader(
"5.jpg", # can rename the attached images
hi_res_model_name=DEFAULT_MODEL,
)
data = img_loader.load()
for i in data:
print(i)OR
from unstructured.partition.image import partition_image
elements = partition_image("5.jpg", hi_res_model_name=DEFAULT_MODEL)
print(elements)
for el in elements:
print(el)Expected behavior
Even if an image is detected with bbox but missing text, we should set the text to empty string instrad of "None" which ends up with exception when we try to print Image element (__str__ method).
Screenshots
Test Images
Environment Info
unstructured 0.18.13
unstructured-client 0.42.3
unstructured-inference 1.0.5
unstructured-pytesseract 0.3.15
detectron2 0.6
torch 2.8.0
torchvision 0.23.0
Additional context
This issue is also related to this issue.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working