Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Sign up

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Unstructured-IO / unstructured Public

Notifications You must be signed in to change notification settings
Fork 907
Star 10.9k

Code
Issues 157
Pull requests 50
Discussions
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: Unstructured-IO/unstructured

Releases · Unstructured-IO/unstructured

0.10.4

18 Aug 21:01

awalker4

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.10.4

0.10.4

Enhancements

Adds ability to reuse connections per process in unstructured-ingest
Pass ocr_mode in partition_pdf and set the default back to individual pages for now

Features

Fixes

Assets 2

Loading

All reactions

0.10.2

17 Aug 06:27

cragwolfe

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.10.2

0.10.2

Enhancements

Bump unstructured-inference==0.5.13:
- Fix extracted image elements being included in layout merge, addresses the issue
  where an entire-page image in a PDF was not passed to the layout model when using hi_res.

Features

Fixes

Assets 2

Loading

All reactions

0.10.1

17 Aug 04:33

cragwolfe

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.10.1

0.10.1

Enhancements

Bump unstructured-inference==0.5.12:
- fix to avoid trace for certain PDF's (0.5.12)
- better defaults for DPI for hi_res and Chipper (0.5.11)
- implement full-page OCR (0.5.10)

Features

Fixes

Fix dead links in repository README (Quick Start > Install for local development, and Learn more > Batch Processing)
Update document dependencies to include tesseract-lang for additional language support (required for tests to pass)

Assets 2

Loading

All reactions

0.10.0

16 Aug 04:36

cragwolfe

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.10.0

0.10.0

Enhancements

Update the links and emphasized_texts metadata fields

Features

Fixes

Assets 2

Loading

All reactions

0.9.3

15 Aug 05:17

cragwolfe

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.9.3

0.9.3

Enhancements

Pinned dependency cleanup.
Update partition_csv to always use soupparser_fromstring to parse html text
Update partition_tsv to always use soupparser_fromstring to parse html text
Add metadata.section to capture epub table of contents data
Add unique_element_ids kwarg to partition functions. If True, will use a UUID
for element IDs instead of a SHA-256 hash.
Update partition_xlsx to always use soupparser_fromstring to parse html text
Add functionality to switch html text parser based on whether the html text contains emoji
Add functionality to check if a string contains any emoji characters

Features

Add Airtable Connector to be able to pull views/tables/bases from an Airtable organization

Fixes

make notion module discoverable
fix emails with Content-Distribution: inline and Content-Distribution: attachment with no filename
Fix email attachment filenames which had = in the filename itself

Assets 2

Loading

All reactions

0.9.2

11 Aug 02:30

cragwolfe

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.9.2

0.9.2

Enhancements

Update table extraction section in API documentation to sync with change in Prod API
Update Notion connector to extract to html
Bump unstructured-inference==0.5.9:
- better caching of models
- another version of detectron2 available, though the default layout model is unchanged
Added UUID option for element_id

Features

Adds Sharepoint connector.

Fixes

Bump unstructured-inference==0.5.9:
- ignores Tesseract errors where no text is extracted for tiles that indeed, have no text

Assets 2

Loading

All reactions

0.9.1

09 Aug 05:56

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.9.1

0.9.1

Enhancements

Adds --partition-pdf-infer-table-structure to unstructured-ingest.
Enable partition_html to skip headers and footers with the skip_headers_and_footers flag.
Update partition_doc and partition_docx to track emphasized texts in the output
Adds post processing function filter_element_types
Set the default strategy for partitioning images to hi_res
Add page break parameter section in API documentation to sync with change in Prod API
Update partition_html to track emphasized texts in the output
Update XMLDocument._read_xml to create <p> tag element for the text enclosed in the <pre> tag
Add parameter include_tail_text to _construct_text to enable (skip) tail text inclusion
Add Notion connector

Features

Fixes

Remove unused _partition_via_api function
Fixed emoji bug in partition_xlsx.
Pass file_filename metadata when partitioning file object
Skip ingest test on missing Slack token
Add Dropbox variables to CI environments
Remove default encoding for ingest
Adds new element type EmailAddress for recognizing email address in the text
Simplifies min_partition logic; makes partitions falling below the min_partition
less likely.
Fix bug where ingest test check for number of files fails in smoke test
Fix unstructured-ingest entrypoint failure

Assets 2

Loading

All reactions

0.9.0

01 Aug 15:32

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.9.0

0.9.0

Enhancements

Dependencies are now split by document type, creating a slimmer base installation.

Assets 2

Loading

All reactions

0.8.8

01 Aug 06:11

cragwolfe

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.8.8

0.8.8

Enhancements

Features

Fixes

Rename "date" field to "last_modified"
Adds Box connector

Assets 2

Loading

All reactions

0.8.7

28 Jul 16:40

yuming-long

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.8.7

0.8.7

Enhancements

Put back useful function split_by_paragraph

Features

Fixes

Fix argument order in NLTK download step

Assets 2

Loading

ctn reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

Previous 1 2 … 9 10 11 12 13 … 17 18 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.