You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
0.7.6
Enhancements
Convert fast startegy to ocr_only for images
Adds support for page numbers in .docx and .doc when user or renderer
created page breaks are present.
Adds retry logic for the unstructured-ingest Biomed connector
Features
Provides users with the ability to extract additional metadata via regex.
Updates partition_docx to include headers and footers in the output.
Create partition_tsv and associated tests. Make additional changes to detect_filetype.
Fixes
Remove fake api key in test partition_via_api since we now require valid/empty api keys
Page number defaults to None instead of 1 when page number is not present in the metadata.
A page number of None indicates that page numbers are not being tracked for the document
or that page numbers do not apply to the element in question..
Fixes an issue with some pptx files. Assume pptx shapes are found in top left position of slide
in case the shape.top and shape.left attributes are None.