Skip to content

Releases: Unstructured-IO/unstructured

0.5.2

02 Mar 19:04
a5da3de
Compare
Choose a tag to compare

0.5.2

Enhancements

  • unstructured-ingest now uses a default --download_dir of $HOME/.cache/unstructured/ingest
    rather than a "tmp-ingest-" dir in the working directory.

Features

Fixes

  • setup_ubuntu.sh no longer fails in some contexts by interpreting
    DEBIAN_FRONTEND=noninteractive as a command
  • unstructured-ingest no longer re-downloads files when --preserve-downloads
    is used without --download-dir.
  • Fixed an issue that was causing text to be skipped in some HTML documents.

0.5.1

01 Mar 00:17
a6f8256
Compare
Choose a tag to compare

0.5.1

Enhancements

Features

Fixes

  • Fixes an error causing JavaScript to appear in the output of partition_html sometimes.
  • Fix several issues with the requires_dependencies decorator, including the error message
    and how it was used, which had caused an error for unstructured-ingest --github-url ....

0.5.0

28 Feb 15:45
6966178
Compare
Choose a tag to compare

0.5.0

Enhancements

  • Add requires_dependencies Python decorator to check dependencies are installed before
    instantiating a class or running a function

Features

  • Added Wikipedia connector for ingest cli.

Fixes

  • Fix process_document file cleaning on failure
  • Fixes an error introduced in the metadata tracking commit that caused NarrativeText
    and FigureCaption elements to be represented as Text in HTML documents.

0.4.16

28 Feb 04:50
5eaf449
Compare
Choose a tag to compare

0.4.16

Enhancements

  • Fallback to using file extensions for filetype detection if libmagic is not present

Features

  • Added setup script for Ubuntu
  • Added GitHub connector for ingest cli.
  • Added partition_md partitioner.
  • Added Reddit connector for ingest cli.

Fixes

  • Initializes connector properly in ingest.main::MainProcess
  • Restricts version of unstructured-inference to avoid multithreading issue

0.4.15

23 Feb 21:59
0d229f0
Compare
Choose a tag to compare

0.4.15

Enhancements

  • Added elements_to_json and elements_from_json for easier serialization/deserialization
  • convert_to_dict, dict_to_elements and convert_to_csv are now aliases for functions
    that use the ISD terminology.

Fixes

  • Update to ensure all elements are preserved during serialization/deserialization

0.4.14

23 Feb 17:25
354eff1
Compare
Choose a tag to compare

0.4.14

  • Automatically install nltk models in the tokenize module.

0.4.13

23 Feb 05:33
83f0454
Compare
Choose a tag to compare

0.4.13

  • Fixes unstructured-ingest cli.

0.4.12

23 Feb 03:54
80c0fab
Compare
Choose a tag to compare

0.4.12

  • Adds console_entrypoint for unstructured-ingest, other structure/doc updates related to ingest.
  • Add parser parameter to partition_html.

0.4.11

17 Feb 17:12
601f250
Compare
Choose a tag to compare

0.4.11

  • Adds partition_doc for partitioning Word documents in .doc format. Requires libreoffice.
  • Adds partition_ppt for partitioning PowerPoint documents in .ppt format. Requires libreoffice.

0.4.10

16 Feb 17:26
f5ff140
Compare
Choose a tag to compare

0.4.10

  • Fixes ElementMetadata so that it's JSON serializable when the filename is a Path object.