Skip to content

0.7.6

Compare
Choose a tag to compare
@yuming-long yuming-long released this 16 Jun 15:09
· 1257 commits to main since this release
a611532

0.7.6

Enhancements

  • Convert fast startegy to ocr_only for images
  • Adds support for page numbers in .docx and .doc when user or renderer
    created page breaks are present.
  • Adds retry logic for the unstructured-ingest Biomed connector

Features

  • Provides users with the ability to extract additional metadata via regex.
  • Updates partition_docx to include headers and footers in the output.
  • Create partition_tsv and associated tests. Make additional changes to detect_filetype.

Fixes

  • Remove fake api key in test partition_via_api since we now require valid/empty api keys
  • Page number defaults to None instead of 1 when page number is not present in the metadata.
    A page number of None indicates that page numbers are not being tracked for the document
    or that page numbers do not apply to the element in question..
  • Fixes an issue with some pptx files. Assume pptx shapes are found in top left position of slide
    in case the shape.top and shape.left attributes are None.