-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pull requests: Unstructured-IO/unstructured
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: remove duplicate characters caused by fake bold rendering in PDFs
#4215
opened Jan 28, 2026 by
bittoby
Loading…
Preserve newlines in Table and TableChunk elements during PDF partitioning
#4214
opened Jan 27, 2026 by
eureka928
Loading…
Fix FutureWarning: Add test to verify bytes are wrapped in BytesIO for read_excel
#4213
opened Jan 27, 2026 by
Angel98518
Loading…
⚡️ Speed up function
merge_out_layout_with_ocr_layout by 30%
#4212
opened Jan 27, 2026 by
aseembits93
Loading…
fix(deps): Update semitechnologies/weaviate Docker tag to v1.35.6
dependencies
Pull requests that update a dependency file
security
#4210
opened Jan 26, 2026 by
utic-renovate
bot
Loading…
1 task
feat: chunking by character and title now isolates tables
#4197
opened Jan 15, 2026 by
badGarnet
Loading…
fix: NameError: LayoutElements not defined in paddle_ocr.py
#4195
opened Jan 15, 2026 by
mohansinghi
Loading…
fix: None text attribute when normalizing Picture to Image element
#4083
opened Aug 22, 2025 by
ishahroz
Loading…
Switch from pdfminer to paves to improve robustness and use multiple CPUs
#4067
opened Jul 19, 2025 by
dhdaines
Loading…
Improve readability of the text by adding new line to the end of row
#3913
opened Feb 7, 2025 by
Sheripov
Loading…
fix: preserve text after line breaks in PowerPoint table cells
#3877
opened Jan 18, 2025 by
yamazombie
Loading…
feat: Allow deactivating OCR entirely with hi_res strategy
#3839
opened Dec 17, 2024 by
dhdaines
Loading…
fix: when convert doc to docx, UnicodeDecodeError may be raised
#3830
opened Dec 14, 2024 by
YooshiJay
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.