Skip to content

0.9.1

Compare
Choose a tag to compare
@ryannikolaidis ryannikolaidis released this 09 Aug 05:56
· 1122 commits to main since this release
2a9fb05

0.9.1

Enhancements

  • Adds --partition-pdf-infer-table-structure to unstructured-ingest.
  • Enable partition_html to skip headers and footers with the skip_headers_and_footers flag.
  • Update partition_doc and partition_docx to track emphasized texts in the output
  • Adds post processing function filter_element_types
  • Set the default strategy for partitioning images to hi_res
  • Add page break parameter section in API documentation to sync with change in Prod API
  • Update partition_html to track emphasized texts in the output
  • Update XMLDocument._read_xml to create <p> tag element for the text enclosed in the <pre> tag
  • Add parameter include_tail_text to _construct_text to enable (skip) tail text inclusion
  • Add Notion connector

Features

Fixes

  • Remove unused _partition_via_api function
  • Fixed emoji bug in partition_xlsx.
  • Pass file_filename metadata when partitioning file object
  • Skip ingest test on missing Slack token
  • Add Dropbox variables to CI environments
  • Remove default encoding for ingest
  • Adds new element type EmailAddress for recognizing email address in the  text
  • Simplifies min_partition logic; makes partitions falling below the min_partition
    less likely.
  • Fix bug where ingest test check for number of files fails in smoke test
  • Fix unstructured-ingest entrypoint failure