Skip to content

Releases: Unstructured-IO/unstructured

0.18.20

15 Nov 00:14
7c4d0b9

Choose a tag to compare

0.18.20

Enhancement

  • Improve the VoyageAI integration
  • Add voyage-context-3 support
  • Flag extracted elements as such in the metadata for downstream use

Features

Fixes

0.18.18

07 Nov 01:05
b01d35b

Choose a tag to compare

0.18.18

Fixes

  • Prevent path traversal in email MSG attachment filenames Fixed a security vulnerability (GHSA-gm8q-m8mv-jj5m) where malicious attachment filenames containing path traversal sequences could write files outside the intended directory. The fix normalizes both Unix and Windows path separators before sanitizing filenames, preventing cross-platform path traversal attacks in partition_msg functions

0.18.17

Enhancement

Features

Fixes

0.18.16

Enhancement

  • Speed up function _assign_hash_ids by 34% (codeflash)

Features

Fixes

0.18.15

17 Sep 14:27
2d44d73

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.18.14...0.18.15

0.18.14

26 Aug 13:25
fed8942

Choose a tag to compare

0.18.14

Enhancements

  • Speed up function sentence_count by 59% (codeflash)

  • Speed up function check_for_nltk_package by 111% (codeflash)

  • Speed up function under_non_alpha_ratio by 76% (codeflash)

Features

Fixes

0.18.13

13 Aug 23:41
0d20f6a

Choose a tag to compare

0.18.13

Fixes

Parse a wider variety of date formats in email headers The partition_email function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents ValueError exceptions when partitioning emails with these date formats.

0.18.12

28 Jul 19:02
b8c14a7

Choose a tag to compare

What's Changed

  • Prevent large file content in encoding exceptions Replace UnicodeDecodeError with UnprocessableEntityError in encoding detection to avoid storing entire file content in exception objects, which can cause issues in logging and error reporting systems when processing large files.

Full Changelog: 0.18.11...0.18.12

0.18.11

23 Jul 13:32
591729c

Choose a tag to compare

What's Changed

Full Changelog: 0.18.10...0.18.11

0.18.10

18 Jul 17:31
a040483

Choose a tag to compare

0.18.10

Enhancements

Features

  • Add OCR_AGENT_CACHE_SIZE environment variable Added configurable cache size for OCR agents to control memory usage.

0.18.9

16 Jul 22:48
909716f

Choose a tag to compare

0.18.9

Enhancements

Features

  • Convert elements to markdown for output Added function to convert elements to markdown format for easy viewing.

Fixes

  • Language detection nit Handle empty text
  • Properly handle password protected xlsx - detect password protection on XLSX files and raise appropriate

0.18.7

15 Jul 20:59
344202f

Choose a tag to compare

0.18.7

Enhancements

Features

  • Add language detection for PDFs Add document and element level language detection to PDFs.

Fixes