Skip to content

Commit dd0f582

Browse files
authored
build(deps): bump unstructured-inference==0.5.13 (#1141)
Bump to unstructured-inference==0.5.13, which includes: Fix extracted image elements being included in layout merge, addresses the issue where an entire-page image in a PDF was not passed to the layout model when using hi_res.
1 parent 9f7bd61 commit dd0f582

File tree

12 files changed

+865
-534
lines changed

12 files changed

+865
-534
lines changed

Diff for: CHANGELOG.md

+11
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
1+
## 0.10.2
2+
3+
### Enhancements
4+
* Bump unstructured-inference==0.5.13:
5+
- Fix extracted image elements being included in layout merge, addresses the issue
6+
where an entire-page image in a PDF was not passed to the layout model when using hi_res.
7+
8+
### Features
9+
10+
### Fixes
11+
112
## 0.10.1
213

314
### Enhancements

Diff for: requirements/constraints.in

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ Pillow<10.0.0
2626
# AttributeError: 'ResourcePath' object has no attribute 'collection'
2727
Office365-REST-Python-Client<2.4.3
2828
# NOTE(christine) Pinned to set the `unstructured-inference` version
29-
unstructured-inference==0.5.12
29+
unstructured-inference==0.5.13

Diff for: requirements/extra-pdf-image.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ typing-extensions==4.7.1
205205
# torch
206206
tzdata==2023.3
207207
# via pandas
208-
unstructured-inference==0.5.12
208+
unstructured-inference==0.5.13
209209
# via
210210
# -c requirements/constraints.in
211211
# -r requirements/extra-pdf-image.in

Diff for: requirements/ingest-confluence.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
#
55
# pip-compile requirements/ingest-confluence.in
66
#
7-
atlassian-python-api==3.40.1
7+
atlassian-python-api==3.41.0
88
# via -r requirements/ingest-confluence.in
99
certifi==2023.7.22
1010
# via

Diff for: test_unstructured_ingest/expected-structured-output/azure/IRS-form-1987.pdf.json

+776-6
Large diffs are not rendered by default.

Diff for: test_unstructured_ingest/expected-structured-output/biomed-api/65/11/main.PMC6312790.pdf.json

+8-8
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,34 @@
1010
"text": "Data in Brief 22 (2019) 451–457"
1111
},
1212
{
13-
"type": "Image",
14-
"element_id": "70d50409ea726a2789ebbd004bec31f4",
13+
"type": "UncategorizedText",
14+
"element_id": "869adddb184177031536477262e0dde0",
1515
"metadata": {
1616
"data_source": {},
1717
"filetype": "application/pdf",
1818
"page_number": 1
1919
},
20-
"text": "Contents lists available at ScienceDirect Data in Brief journal homepage: www.elsevier.com/locate/dib"
20+
"text": "Contents lists available at ScienceDirect"
2121
},
2222
{
2323
"type": "UncategorizedText",
24-
"element_id": "869adddb184177031536477262e0dde0",
24+
"element_id": "e6fa42b5b4d85001b900e47c050b645b",
2525
"metadata": {
2626
"data_source": {},
2727
"filetype": "application/pdf",
2828
"page_number": 1
2929
},
30-
"text": "Contents lists available at ScienceDirect"
30+
"text": "Data in Brief"
3131
},
3232
{
33-
"type": "UncategorizedText",
34-
"element_id": "e6fa42b5b4d85001b900e47c050b645b",
33+
"type": "NarrativeText",
34+
"element_id": "9234133787d0a6b3976b16569c0b5cf3",
3535
"metadata": {
3636
"data_source": {},
3737
"filetype": "application/pdf",
3838
"page_number": 1
3939
},
40-
"text": "Data in Brief"
40+
"text": "journal homepage: www.elsevier.com/locate/dib"
4141
},
4242
{
4343
"type": "UncategorizedText",

Diff for: test_unstructured_ingest/expected-structured-output/biomed-api/75/29/main.PMC6312793.pdf.json

+8-8
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,34 @@
1010
"text": "Data in Brief 22 (2019) 484–487"
1111
},
1212
{
13-
"type": "Image",
14-
"element_id": "70d50409ea726a2789ebbd004bec31f4",
13+
"type": "UncategorizedText",
14+
"element_id": "869adddb184177031536477262e0dde0",
1515
"metadata": {
1616
"data_source": {},
1717
"filetype": "application/pdf",
1818
"page_number": 1
1919
},
20-
"text": "Contents lists available at ScienceDirect Data in Brief journal homepage: www.elsevier.com/locate/dib"
20+
"text": "Contents lists available at ScienceDirect"
2121
},
2222
{
2323
"type": "UncategorizedText",
24-
"element_id": "869adddb184177031536477262e0dde0",
24+
"element_id": "e6fa42b5b4d85001b900e47c050b645b",
2525
"metadata": {
2626
"data_source": {},
2727
"filetype": "application/pdf",
2828
"page_number": 1
2929
},
30-
"text": "Contents lists available at ScienceDirect"
30+
"text": "Data in Brief"
3131
},
3232
{
33-
"type": "UncategorizedText",
34-
"element_id": "e6fa42b5b4d85001b900e47c050b645b",
33+
"type": "NarrativeText",
34+
"element_id": "9234133787d0a6b3976b16569c0b5cf3",
3535
"metadata": {
3636
"data_source": {},
3737
"filetype": "application/pdf",
3838
"page_number": 1
3939
},
40-
"text": "Data in Brief"
40+
"text": "journal homepage: www.elsevier.com/locate/dib"
4141
},
4242
{
4343
"type": "UncategorizedText",

Diff for: test_unstructured_ingest/expected-structured-output/local-single-file-with-pdf-infer-table-structure/layout-parser-paper.pdf.json

+21-11
Original file line numberDiff line numberDiff line change
@@ -852,7 +852,7 @@
852852
},
853853
{
854854
"type": "FigureCaption",
855-
"element_id": "185e67615d123b35d38ea72e0cdb6d99",
855+
"element_id": "d21661161ae2c8dc39e96ee5c660704b",
856856
"metadata": {
857857
"data_source": {},
858858
"filetype": "application/pdf",
@@ -960,16 +960,6 @@
960960
},
961961
"text": "LayoutParser provides a unified interface for existing OCR tools. Though there are many OCR tools available, they are usually configured differently with distinct APIs or protocols for using them. It can be inefficient to add new OCR tools into an existing pipeline, and difficult to make direct comparisons among the available tools to find the best option for a particular project. To this end, LayoutParser builds a series of wrappers among existing OCR engines, and provides nearly the same syntax for using them. It supports a plug-and-play style of using OCR engines, making it effortless to switch, evaluate, and compare different OCR modules:"
962962
},
963-
{
964-
"type": "Image",
965-
"element_id": "65ac0f9ae348b12ed9484b8af7296617",
966-
"metadata": {
967-
"data_source": {},
968-
"filetype": "application/pdf",
969-
"page_number": 7
970-
},
971-
"text": "ocr_agent = lp.TesseractAgent ()pOi"
972-
},
973963
{
974964
"type": "ListItem",
975965
"element_id": "bebbb4e94f1f97edeb5b96e252720a93",
@@ -1351,6 +1341,26 @@
13511341
},
13521342
"text": "x09 Burpunog uayor Aeydsiq 1 vondo 10g Guypunog usyoy apir:z uondo Mode I: Showing Layout on the Original Image Mode Il: Drawing OCR'd Text at the Correspoding Position"
13531343
},
1344+
{
1345+
"type": "NarrativeText",
1346+
"element_id": "aed1b21a388cefaa841f20f48d19ca98",
1347+
"metadata": {
1348+
"data_source": {},
1349+
"filetype": "application/pdf",
1350+
"page_number": 9
1351+
},
1352+
"text": "Mode I: Showing Layout on the Original Image"
1353+
},
1354+
{
1355+
"type": "NarrativeText",
1356+
"element_id": "915bc5f1403e01b56e77300d9354fded",
1357+
"metadata": {
1358+
"data_source": {},
1359+
"filetype": "application/pdf",
1360+
"page_number": 9
1361+
},
1362+
"text": "Mode Il: Drawing OCR'd Text at the Correspoding Position"
1363+
},
13541364
{
13551365
"type": "NarrativeText",
13561366
"element_id": "cc8ad6e0f933633a37b82200e6724f9e",

0 commit comments

Comments
 (0)