You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
0.5.8
Enhancements
Update elements_to_json to return string when filename is not specified
elements_from_json may take a string instead of a filename with the text kwarg
detect_filetype now does a final fallback to file extension.
Empty tags are now skipped during the depth check for HTML processing.
Features
Add local file system to unstructured-ingest
Add --max-docs parameter to unstructured-ingest
Added partition_msg for processing MSFT Outlook .msg files.
Fixes
convert_file_to_text now passes through the source_format and target_format kwargs.
Previously they were hard coded.
Partitioning functions that accept a text kwarg no longer raise an error if an empty
string is passed (and empty list of elements is returned instead).
partition_json no longer fails if the input is an empty list.
Fixed bug in chunk_by_attention_window that caused the last word in segments to be cut-off
in some cases.
BREAKING CHANGES
stage_for_transformers now returns a list of elements, making it consistent with other
staging bricks