Releases · Unstructured-IO/unstructured-inference · GitHub

24 Jul 19:10

rbiseck3

0.5.6

Update the annotate and _get_image_array methods of PageLayout to get the image from the image_path property if the image property is None.
Add functionality to store pdf images for later use.
Add image_metadata property to PageLayout & set page.image to None to reduce memory usage.
Update DocumentLayout.from_file to open only one image.
Update load_pdf to return either Image objects or Image paths.
Warns users that Chipper is a beta model.
Exposed control over dpi when converting PDF to an image.
Updated detectron2 version to avoid errors related to deprecated PIL reference

Assets 2

07 Jul 13:20

benjats07

0.5.5

Rename large model to chipper
Reduced memory usage when working on PDFs
Fix issue with table processing
Added execution providers for CUDA and TensorRT
Warning supression for ONNX inference on empty pages.
Updates

Library	From	To
ruff	0.0.270	0.0.276
mypy	1.3.0	1.4.1
onnxruntime	1.15.0	1.15.1

Assets 2

29 Jun 22:20

qued

0.5.4

0.5.4

Tweak to element ordering to make it more deterministic

Assets 2

29 Jun 17:32

qued

0.5.3

0.5.3

Refactor for large model

Assets 2

21 Jun 22:21

qued

0.5.2

0.5.2

Combine inferred elements with extracted elements
Add ruff to keep code consistent with unstructured

Assets 2

30 May 21:15

qued

0.5.1

0.5.1

Add annotation for pages
Store page numbers when processing PDFs
Hotfix to handle inference of blank pages using ONNX detectron2
Revert ordering change to investigate examples of misordering

Assets 2

18 May 19:56

MthwRobinson

0.5.0

0.5.0

Preserve image format in PIL.Image.Image when loading
Added ONNX version of Detectron2 and make default model
Remove API code, we don't serve this as a standalone API any more
Update ordering logic to account for multicolumn documents.

Assets 2

05 May 01:56

qued

0.4.4

0.4.4

Fixed patches not being a package.

Assets 2

04 May 20:26

qued

0.4.3

0.4.3

Patch pdfminer.six to fix parsing bug

Assets 2

21 Apr 04:05

qued

0.4.2

0.4.2

Output of table extraction is now stored in text_as_html property rather than text property

Assets 2