Skip to content

Releases: Unstructured-IO/unstructured-inference

0.5.6

24 Jul 19:10
06c0057

Choose a tag to compare

  • Update the annotate and _get_image_array methods of PageLayout to get the image from the image_path property if the image property is None.
  • Add functionality to store pdf images for later use.
  • Add image_metadata property to PageLayout & set page.image to None to reduce memory usage.
  • Update DocumentLayout.from_file to open only one image.
  • Update load_pdf to return either Image objects or Image paths.
  • Warns users that Chipper is a beta model.
  • Exposed control over dpi when converting PDF to an image.
  • Updated detectron2 version to avoid errors related to deprecated PIL reference

0.5.5

07 Jul 13:20
41cb7a7

Choose a tag to compare

  • Rename large model to chipper
  • Reduced memory usage when working on PDFs
  • Fix issue with table processing
  • Added execution providers for CUDA and TensorRT
  • Warning supression for ONNX inference on empty pages.
  • Updates
Library From To
ruff 0.0.270 0.0.276
mypy 1.3.0 1.4.1
onnxruntime 1.15.0 1.15.1

0.5.4

29 Jun 22:20
99c8196

Choose a tag to compare

0.5.4

  • Tweak to element ordering to make it more deterministic

0.5.3

29 Jun 17:32
bd61292

Choose a tag to compare

0.5.3

  • Refactor for large model

0.5.2

21 Jun 22:21
a8fefc2

Choose a tag to compare

0.5.2

  • Combine inferred elements with extracted elements
  • Add ruff to keep code consistent with unstructured

0.5.1

30 May 21:15
6494128

Choose a tag to compare

0.5.1

  • Add annotation for pages
  • Store page numbers when processing PDFs
  • Hotfix to handle inference of blank pages using ONNX detectron2
  • Revert ordering change to investigate examples of misordering

0.5.0

18 May 19:56
56ce657

Choose a tag to compare

0.5.0

  • Preserve image format in PIL.Image.Image when loading
  • Added ONNX version of Detectron2 and make default model
  • Remove API code, we don't serve this as a standalone API any more
  • Update ordering logic to account for multicolumn documents.

0.4.4

05 May 01:56
60390c8

Choose a tag to compare

0.4.4

  • Fixed patches not being a package.

0.4.3

04 May 20:26
09e0397

Choose a tag to compare

0.4.3

  • Patch pdfminer.six to fix parsing bug

0.4.2

21 Apr 04:05
86bbb37

Choose a tag to compare

0.4.2

  • Output of table extraction is now stored in text_as_html property rather than text property