Skip to content

0.5.8

Compare
Choose a tag to compare
@MthwRobinson MthwRobinson released this 30 Mar 20:57
· 1430 commits to main since this release
4148834

0.5.8

Enhancements

  • Update elements_to_json to return string when filename is not specified
  • elements_from_json may take a string instead of a filename with the text kwarg
  • detect_filetype now does a final fallback to file extension.
  • Empty tags are now skipped during the depth check for HTML processing.

Features

  • Add local file system to unstructured-ingest
  • Add --max-docs parameter to unstructured-ingest
  • Added partition_msg for processing MSFT Outlook .msg files.

Fixes

  • convert_file_to_text now passes through the source_format and target_format kwargs.
    Previously they were hard coded.
  • Partitioning functions that accept a text kwarg no longer raise an error if an empty
    string is passed (and empty list of elements is returned instead).
  • partition_json no longer fails if the input is an empty list.
  • Fixed bug in chunk_by_attention_window that caused the last word in segments to be cut-off
    in some cases.

BREAKING CHANGES

  • stage_for_transformers now returns a list of elements, making it consistent with other
    staging bricks