Releases: Unstructured-IO/unstructured
Releases · Unstructured-IO/unstructured
0.18.15
What's Changed
- Setup Codeflash Github Actions to optimize all future code by @misrasaurabh1 in #4082
- fix: update deps to resolve cve by @qued in #4093
- ⚡️ Speed up function
group_broken_paragraphsby 30% by @aseembits93 in #4088 - ⚡️ Speed up method
ElementHtml._get_children_htmlby 234% by @aseembits93 in #4087 - Luke/sept16 CVE by @luke-kucing in #4094
New Contributors
- @aseembits93 made their first contribution in #4088
Full Changelog: 0.18.14...0.18.15
0.18.14
0.18.14
Enhancements
-
Speed up function sentence_count by 59% (codeflash)
-
Speed up function
check_for_nltk_packageby 111% (codeflash) -
Speed up function
under_non_alpha_ratioby 76% (codeflash)
Features
Fixes
- change short text language detection log to debug reduce warning level log spamming
- Bumped dependencies via pip-compile to address the following CVEs:
- Python 3.12/3.13: CVE-2025-8194, GHSA-v594-44hm-2j7p
- glibc & related (glibc, glibc-locale-posix, ld-linux, libcrypt1): CVE-2025-8058, GHSA-8xjp-c72j-67q8
- aiohttp: GHSA-9548-qrrj-x5pj
- openjpeg: CVE-2025-54874
- pypdf: GHSA-7hfw-26vp-jp8m
- transformers: GHSA-9356-575x-2w9m
- urllib3: GHSA-48p4-8xcf-vxj5
- Bumped dependencies via pip-compile to address the following CVEs:
0.18.13
0.18.13
Fixes
Parse a wider variety of date formats in email headers The partition_email function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents ValueError exceptions when partitioning emails with these date formats.
0.18.12
What's Changed
- Prevent large file content in encoding exceptions Replace UnicodeDecodeError with UnprocessableEntityError in encoding detection to avoid storing entire file content in exception objects, which can cause issues in logging and error reporting systems when processing large files.
Full Changelog: 0.18.11...0.18.12
0.18.11
What's Changed
- add '|' as a delimiter in csv files by @jiajun-unstructured in #4059
- feat: map tags by
type+ add coverage by @MaksOpp in #4068 - chore: switch to charset normalizer by @qued in #4060
- bump version and release by @MaksOpp in #4070
Full Changelog: 0.18.10...0.18.11
0.18.10
0.18.9
0.18.9
Enhancements
Features
- Convert elements to markdown for output Added function to convert elements to markdown format for easy viewing.
Fixes
- Language detection nit Handle empty text
- Properly handle password protected xlsx - detect password protection on XLSX files and raise appropriate
0.18.7
0.18.7
Enhancements
Features
- Add language detection for PDFs Add document and element level language detection to PDFs.
Fixes
0.18.6
0.18.6
Enhancements
Features
Fixes
- Improved epub partition errors EPUB partition will now produce new type of error on unprocessable files.
- Fix type for serialized TableChunks Use
TableChunkfor the string value of the fieldtypewhen serializing elements of typeTableChunk, rather than using the valueTable.
0.18.4
What's Changed
- fix(partition, csv): increase csv field limit by @ds-filipknefel in #4046
Full Changelog: 0.18.3...0.18.4