Skip to content

Releases: Unstructured-IO/unstructured

0.18.15

17 Sep 14:27
2d44d73

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.18.14...0.18.15

0.18.14

26 Aug 13:25
fed8942

Choose a tag to compare

0.18.14

Enhancements

  • Speed up function sentence_count by 59% (codeflash)

  • Speed up function check_for_nltk_package by 111% (codeflash)

  • Speed up function under_non_alpha_ratio by 76% (codeflash)

Features

Fixes

0.18.13

13 Aug 23:41
0d20f6a

Choose a tag to compare

0.18.13

Fixes

Parse a wider variety of date formats in email headers The partition_email function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents ValueError exceptions when partitioning emails with these date formats.

0.18.12

28 Jul 19:02
b8c14a7

Choose a tag to compare

What's Changed

  • Prevent large file content in encoding exceptions Replace UnicodeDecodeError with UnprocessableEntityError in encoding detection to avoid storing entire file content in exception objects, which can cause issues in logging and error reporting systems when processing large files.

Full Changelog: 0.18.11...0.18.12

0.18.11

23 Jul 13:32
591729c

Choose a tag to compare

What's Changed

Full Changelog: 0.18.10...0.18.11

0.18.10

18 Jul 17:31
a040483

Choose a tag to compare

0.18.10

Enhancements

Features

  • Add OCR_AGENT_CACHE_SIZE environment variable Added configurable cache size for OCR agents to control memory usage.

0.18.9

16 Jul 22:48
909716f

Choose a tag to compare

0.18.9

Enhancements

Features

  • Convert elements to markdown for output Added function to convert elements to markdown format for easy viewing.

Fixes

  • Language detection nit Handle empty text
  • Properly handle password protected xlsx - detect password protection on XLSX files and raise appropriate

0.18.7

15 Jul 20:59
344202f

Choose a tag to compare

0.18.7

Enhancements

Features

  • Add language detection for PDFs Add document and element level language detection to PDFs.

Fixes

0.18.6

15 Jul 19:08
2ffaf6f

Choose a tag to compare

0.18.6

Enhancements

Features

Fixes

  • Improved epub partition errors EPUB partition will now produce new type of error on unprocessable files.
  • Fix type for serialized TableChunks Use TableChunk for the string value of the field type when serializing elements of type TableChunk, rather than using the value Table.

0.18.4

08 Jul 08:13
f078cd9

Choose a tag to compare

What's Changed

Full Changelog: 0.18.3...0.18.4